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BACKGROUND OF THE INVENTION 

The U.S. Government may own rights in the application pursuant to funding from the 
National Institutes of Health (DK49835). 

5 

1 • Field of the Invention 

The present invention relates to the fields of biochemistry, cellular biology and molecular 
biology. More particularly, it relates to the field of protein biochemistry, and specifically, to the 
use of an assay for determining protein folding and solubility, 

10 

2. Description of Related Art 

There are a wide variety of potential applications for a genetic system enabling rapid and 
efficient evaluation of protein solubility characteristics in vivo. One of the cornerstones of 
biotechnology is the ability to express target proteins in functional form in vivo in 
15 genetically-engineered organisms. However, many important target proteins are not efficiently 
expressed in soluble form in bacteria such as co% due at least in part to the complexity of the 
protein folding process in vivo (Houry ei al^ 1999). When encountering a target protein that 
fails to be expressed in soluble form in v/va, the yield of soluble protein can often be improved 
by optimizing various factors such as the primary sequence of the target protein (Huang et al, 
20 1996) or the genetic background or growth conditions of the bacterium (Hung et al, 1998; 
Brown et al, 1997; Blackwell & Horgan, 1991; Bourot et al, 2000; Sugihara & Baldwin, 1988; 
Wynn et al^ 1992). However, existing assays for protein expression in soluble form are tedious, 
usually requiring lysis and fractionation of cells followed by protein analysis by 
SDS-polyacrylamide gel electrophoresis. Using this traditional approach, screening for protein 
25 constructs and/or physiological conditions yielding improved solubility is inefiBcient, and genetic 
selection is impossible. 

Protein folding diseases represent a second area in which protein solubility characteristics 
are of vital medical and technological importance (Thomas et aL, 1995; Dobson, 1999). These 
30 diseases, which have proven particularly refractory to pharmaceutical development, are caused 
either by misfolding of a protein during biosynthesis subsequent to acquiring some mutation 
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(Brown et al, 1997; Thomas et aly 1992; Rao et aly 1994) or by aberrant protein processing 
leading to the formation of an aggregation-prone product, such as the peptide forming the 
amyloid plaques associated with Alzheimer's disease (Tan & Pepys, 1994; Harper & Lansbury, 

1997) , SODl in amyotropic lateral sclerosis (Bruijn et al, 1998), a-synuclem in Parkinson's 
5 disease (Galvin et aL, 1983), amyloid A and P deposits in systemic amyloidosis (Hind et al, 

1983), transthyretin fibrils in fatal familial insomnia (Colon & Kelly, 1992) and the intranuclear 
inclusions associated with polyglutamine expansions which cause Huntington's disease (Martin 
& Gusella, 1986; HDCRG, 1993; Davies et al, 1997), spinocerebellar ataxia (Wells & Warren, 

1998) , spinobulbar muscular atrophy (La Spada et aL, 1991), and MachadoJoseph Disease 
10 (Kawaguchi et al, 1994). The ability to rapidly and efficiently screen for protein solubility in 

vivo could also be applied to the development of assays for pharmaceutical compounds 
preventing the misfolding or aggregation of proteins involved in protein folding diseases (/.e, 
assays for compounds that prevent precipitation of such aggregation-prone proteins). 

15 Thus, there remains a need in the field for improved methods of screening for protein 

folding and solubility. 

SUMMARY OF THE INVENTION 

20 The present invention involves the use of a genetic system based on structural 

complementation (Richards & Vithayati, 1959; Ultaiann et al, 1967; Taniuichi & Anfinsen, 
1971; Zabin & Villarejo, 1975; Pecorari et al, 1993; Schonberger et aL, 1996) of a selectable 
marker protein can be used as the basis of a direct in vivo solubility assay. Structural 
complementation involves the division of a protein into two component segments which must be 

25 combined to form a stable and fully functional structure. The specific implementation of the 
method is an adaptation of the classic a-complementation system of p-galactosidase (p-gal) 
(Ullmann et aL, 1967). However, the same concept could potentially be applied to other 
selectable genetic markers like chloramphenicol transacetylase or even screenable markers like 
the green fluorescent protein (although appropriately complementing firagments of these proteins 

30 would have to be developed first), p-gal can be divided into two fi-agments (a and ©) capable of 
associating with each other to form an active enzyme (Ulhnann et aL, 1967). Redistribution of 
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the a-fragment from the soluble to the insoluble fraction in E. coli cells leads to a reduction in 
the level of p-gal activity which can be assayed either during growth on indicator agar plates 
using the chromogenic substrate X-gal, or in suspension culture. Fusion of the a-fragment to the 
C-terminus of a target protein leads to the formation of a chimeric protein with solubility 
properties similar to that of the target protein alone. Thus, P-gal activity levels report the 
solubihty of the target fusion. By contrast, three extant systems for monitoring solubility and 
misfolding in vivo rely on the use of fusions with the full-length maker proteins p-gal (Lee et al, 
1990), GFP (Waldo et al, 1999) and CAT (Maxwell et al, 1999). It is well documented that the 
solubility properties of protein fusions to intact marker enzymes tend to be dominated by the 
solubility properties of the marker enzyme, as evidenced by the use of MBP (Ko et aL, 1993; 
Kapust er a/., 1999), thioredoxin (Papouchado et al, 1997), and GST (Wang et al, 1999) fusions 
to enhance the solubility of some otherwise insoluble protein constructs. Such a colorimetric 
plate assay should be readily adapted to efficient high-throughput screening. 

Thus, there is provided, a method for assessing protein folding and/or solubility 
comprising (a) providing an expression construct comprising (i) a gene encoding fusion protein, 
said fusion protein comprising a protein of interest fused to a first segment of a marker protein, 
wherein said first segment does not affect the folding or solubility of the protein of interest, and 
(ii) a promoter active in said host cell and operably linked to said gene, (b) expressing said fusion 
protein in a host cell that also expresses a second segment of said marker protein, wherein said 
second segment is capable of structural complementation with said first segment, and (c) 
determining structural complementation, wherein a greater degree of structural complementation, 
as compared to structural complementation observed with appropriate negative controls, 
indicates proper folding and/or solubility of said protein. 

The fusion may be N- or C-terminal to said protein of interest. The marker protein may 
be selected from the group consisting of a target binding protein, an enzyme, a protein inhibitor, 
and a chromophore. Examples include ubiquitin, green fluorescent protein, blue fluorescent 
protein, yellow fluorescent protein, luciferase, aquorin, p-galactosidase, cytochrome c, 
chymotrypsin inhibitor, RNase, phosphoglycerate kinase, invertase, staphylococcal nuclease, 
thioredoxin C, lactose permease, amino acyl tRNA synthase, and dihydrofolate reductase. In the 

1657123.1 



particular case of p-galactosidase, the first segment is the a-peptide of P-galactosidase, and said 
second segment is the ©-peptide of p-galactosidase. In certain embodiments the marker protein 
is associated with a detectable phenotype, including enzymatic activity, chromophore or 
fluorophore activity. 

5 

The protein of interest may be Alzheimer's amyloid peptide (AP), SODl, presenillin 1 
and 2, a-synuclein, amyloid A, amyloid P, CFTR, transthyretin, amylin, lysozyme, gelsolin, p53, 
rhodopsin, insulin, insulin receptor, fibrillin, a-ketoacid dehydrogenase, collagen, keratin, 
PRNP, immunoglobulin light chain, atrial natriuretic peptide, seminal vesicle exocrine protein, 
10 p2-microglobulin, PrP, precalcitonin, ataxin 1, ataxin 2, ataxin 3, ataxin 6, ataxin 7, huntingtin, 
androgen receptor, CREB-binding protein, dentaorubral paUidoluysian atrophy-associated 
protein, maltose-binding protein, ABC transporter, glutathione S transferase, and thioredoxin. 

The gene encoding the second segment may be carried on a chromosome of said host cell 
15 or episomally. The host cell may be a bacterial cell, an insect cell, a yeast cell, a nematode cell, 
and a mammalian cell. Examples include E, coli., C elegans, or S. fugeria, and a variety of 
mammalian cells. Preferred promoters include Tag promoter; T7 promoter, or P/^c promoter 
(bacterial), CupADH, Gal (yeast) or PepCk or tk (mammalian). 

20 In particular embodiment, the method utilizes a negative control that is a host cell lacking 

the second segment of said marker protein and/or a fiision protein that is improperly folded 
and/or insoluble. 

In another embodiment, there is provided, a method for screening protein folding and/or 
25 solubility mutants comprising (a) providing a gene encoding fusion protein comprising (i) a 
protein of interest and (ii) a first segment of a marker protein, wherein said first segment does not 
afifect the folding or solubility of the protein of interest, , wherein said fiision protein is not 
properly folded and/or soluble when expressed in said host cell, and (ii) a promoter active in said 
host cell and operably linked to said gene, wherein said fusion protein is not properly folded 
30 and/or soluble when expressed in said host cell, (b) mutagenizing that portion of the gene 
encoding said protein of interest, (c) expressing said fiision protein in a host cell that expresses a 
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second segment of said marker protein, wherein said second segment is capable of structural 
complementation with said first segment, and (d) determining structural complementation, 
wherein a relative increase in structural complementation, as compared to the structural 
complementation observed with the unmutagenized fusion protein, indicates an increase in 
5 proper folding and/or solubility of said protein. 

In yet another embodiment, there is provided a method for screening candidate modulator 
substance that modulates protein folding and/or solubility comprising (a) providing an 
expression construct comprising (i) a gene encoding fusion protein, said fusion protein 

10 comprising a protein of interest fused to a first segment of a marker protein, wherein said first 
segment does not affect the folding or solubility of the protein of interest, and (ii) a promoter 
active in said host cell and operably linked to said gene, (b) expressing said fusion protein in a 
host cell that expresses a second segment of said marker protein, wherein said second segment is 
capable of structural complementation with said first segment, (c) contacting the host cell with 

15 said candidate modulator substance; and (d) determining structural complementation, wherein a 
relative change in structural complementation, as compared to the structural complementation 
observed in the absence of said candidate modulator substance, indicates that said candidate 
modulator substance is a modulator of protein folding and/or solubility. The candidate 
modulator substance may be a protein, a nucleic acid or a small molecule. 

20 

Following long-standing patent language convention, the terms "a" or "an," when used in 
conjimction with "comprising," may mean one or more than one, herein the description and 
claims. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

The following drawings form part of the present specification and are included to further 
demonstrate certain aspects of the present invention. The invention may be better understood by 
reference to one or more of these drawings in combination with the detailed description of 
30 specific embodiments presented herein. 
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FIG. 1 A and IB: An in vivo solubility assay based on structural complementation. (FIG. 
lA) A schematic depicting the complementation solubility assay. P (squares) represents the 
target protein, and a (triangles) and co (trapezoids) represent each of the complementing 
fragments of the tetrameric p-galactosidase. Brackets indicate the concentration dependence of 

. 5 the assay regarding the availability of soluble (folded) target/a fusion. Kd is indicated solely to 
highlight the concentration-dependent equilibrium association/dissociation reaction. (FIG. IB) 
A schematic representation of the target protein/a-fragment C-terminal fusion expression 
construct (a-fragment, residues 7-58 from full length P-galactosidase). "HA" indicates the 
position of the inserted influenza hemagglutinin (HA) immuno-tag (residue sequence 

10 YPYDVPDYA) present in some of the constructs examined. 

FIG. 2. Correlation of p-galactosidase activity with fusion protein solubility and folding. 
A scatter plot correlating the in vitro p-galactosidase activity measured in cell lysates (see Table 
1) with the fraction soluble (open circles) and the reported periplasmic yield (filed squares) for 
1 5 each of the MBP/a-fi^gment fusion proteins examined. 

DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

Protein misfolding is the basis of a number of human diseases. It also presents a sizable 
20 obstacle to the production of functional recombinant proteins. In addition, there is a tremendous 
potential to modulate in vivo function of proteins by modulating protein folding. To date, the 
study of misfolding and its circumvention has required development of specific assays for each 
individual case. 

25 However, for maximum utility, such a method should provide an easily measured signal, 

be sensitive to subtle changes in the solubility of the target protein over a wide concentration 
range, allow phenotypic selection of the soluble protein, and have minimal effect on the 
solubility of the target protein. The present invention ofTers each of these advantages. 

30 The present invention utilizes generalized fiision constructs and the phenomenon of 

"structural complementation" to examine protein folding and/or solubility in cell- or organism- 

1657123.1 

-7- 



based screening. In a particular embodiment, the a and o peptides of (J-galactosidase are used, 
the first as a fusion partner for a given protein of interest, in a complementation assay. Where 
the protein of interest is properly folded, the fusion remains soluble and can associate the other 
peptide of P-galactosidase, permitting enzyme activity and detection. A variety of different host 
5 cells, "structural complementation" pairs (enzymes, binding proteins, chromophores) and target 
proteins can be used. 

The studies presented herein demonstrate that this system reliably reports on the 
solubility of eight fused target proteins: the maltose binding protein and mutants thereof, the 

10 first nucleotide binding domains of the cystic fibrosis transmembrane conductance regulator and 
the branched chain amino acid transporter from the hyperthermophilic archeon methanococcus 
jannaschiii and the Ap peptide of the Alzheimer's precursor protein. The fact that the signal 
produced by the fusions is proportional to the solubiHty of the nucleotide binding domain targets 
when expressed without the a-fragment indicates that this relatively small polypeptide does not 

15 significantly effect the solubility of the target protein, unlike fusions to a larger marker protein 
{e.g., M13P, Harper and Lansbury, 1997) . This could provide a significant advantage over two 
recently reported solubility monitoring systems that rely on fusions with larger soluble proteins, 
namely fiiU length p-gal (Lee et al, 1990), GFP (Waldo et al, 1999) and CAT (Maxwell et al, 
1999). It is well-documented that fusions with highly soluble proteins such as GST (Wang et al, 

20 1999), MBP (Ko et al, 1993), and thioredoxin (Papouchado et al, 1997), and the 
immunoglobulin binding domain (GBl) (Huth et at, 1997) significantly improve the solubility 
properties of a variety of expressed proteins. Thus, it is reasonable to expect that in some cases, 
GFP and CAT may have a significant effect on the solubility of the target. 

25 As mentioned above, this system has several potential uses. For example, recombinant 

production systems can be tested to determine if the polypeptide to be produced is properly 
folded. In addition, target proteins may be diagnostic of disease states. The system also could 
find utility in the development and selection of bacterial strains particularly effective at 
expressing and folding heterologous proteins, or for phenotypic selection of a wide variety of 

30 proteins in their study by random mutagenesis. These powerful approaches currently are limited 
to proteins which themselves are required for a measurable cellular function. Thus, the present 

1657123.1 



-8- 



solubility detection system provides an important avenue for understanding fundamental 
biological processes such as how primary sequence directs the formation of a unique 
three-dimensional structure, or the identity and mechanisms of cellular systems important for 
efficient protein maturation. 

5 

One aspect of the invention is the minimal impact of the fusion partners on the protein of 
interest. The presence of only "systematic" effects (z.e., similar both in the presence and absence 
of either drug or mutation) on the solubility of the target permits ready comparison. This 
actually provides the added advantage of beign able to adjust the sensitivity of the assay 
10 depending on the target protein of interest. Recent discovery of mutations in the a subimit 
permit "tuning" of the a - G interaction which also can be used for altering the sensitivity. 

Perhaps the most exciting application of the system is the discovery of drugs which 
modulate the folding of disease related proteins. Previously, the search for pharmaceuticals has 

15 focused on the identification of compounds which inhibit cellular processes. However, the 
increasing prevalence of diseases associated with protein misfolding such as Huntington's 
disease, Alzheimer's disease, Parkinson's disease, cystic fibrosis, amyotropic lateral schlerosis, 
Creutzfeld- Jacob disease, and some forms of diabetes and cancer presents a new challenge for 
the pharmaceutical industry. The identification of drugs which target proteins with a propensity 

20 to misfold requires the development of novel screening and assay methodologies such as the 
a-complementation system described herein. Encouraging evidence that such pharmaceuticals 
may be identified has recently been provided by Rastinejad and co-workers (Foster et al^ 1999) 
who reported the identification of a class of compounds which stabilized a folding mutant of p53 
in a soluble and functional conformation, thereby rescuing its abiUty to prevent tumor growth in 

25 mice. 

Various aspects of the invention are described, in greater detail, in the following pages. 

A. Protein Folding and Mutant Proteins 

30 Several diseases, such as Alzheimer's disease, Parkinson's disease, Hxmtington's disease, 

and others are thought to be the result of, or associated with misfolding in vivo. In certain 
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embodiments, the present invention provides a method of assaying for the presence of protein 
misfolding in a living cell. 

Proteins expressed through recombinant means often misfold, particularly in prokaryotic 
5 host cells that lack the processing machinery of an eukaryotic cell When a protein misfolds, it 
often becomes less soluble, and may precipitate in the cell as an inclusion body. Additionally, 
mutations in naturally occurring proteins increase the rate of misfolding when endogenously 
expressed, as well as when exogenously expressed in a recombinant host cell. In certain 
embodiments, the present invention allows various mutations, whether natural or produced by 
10 the hand of man, to be assayed for their ability to increase or decrease protein misfolding in vivo. 

1. Fusion Proteins 

An aspect of the present invention is the discovery that peptides, polypeptides or proteins, 
useful for alpha complementation, may be joined to a larger soluble protein, polypeptide or 

1 5 peptide, wherein the folding reaction is dominated by the soluble protein, polypeptide or peptide. 
The soluble protein, peptide or polypeptide may have the same length or amino acid sequence as 
the endogenously produced protein, polypeptide or peptide. In other embodiments, the soluble 
protein, peptide or polypeptide may be a truncated protein, protein domain or protein fragment of 
a larger peptide chain. For example, the folding of the soluble fragments of a membrane 

20 embedded or otherwise hydrophobic protein may be used to create a fusion protein. 

Fusion proteins are produced by operatively linking at least one nucleic acid encoding at 
least one amino acid sequence to at least a second nucleic acid encoding at least a second amino 
acid sequence, so that the encoded sequences are translated as a contiguous amino acid sequence 
25 either in vitro or in vivo. Fusion protein design and expression is well known in the art, and 
methods of fusion protein expression are described herein, and in references, such as, for 
example, U.S. Patent 5,935,824, incorporated herein by reference. 

In certain embodiments, a peptide, polypeptide or protein may be joined at or near the N- 
30 terminal or C-terminal end of a soluble protein, peptide or polypeptide. In certain embodiments, 
it is contemplated that the alpha complementing peptide or polypeptide may be attached to the 
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soluble protein, peptide or polypeptide via a linker moiety. One such linker is another peptide, 
such as described in U.S. Patent 5,990,275, incorporated herein by reference. 

2. Mutagenesis 

5 Where employed, mutagenesis will be accomplished by a variety of standard, mutagenic 

procedures. Mutation is the process whereby changes occur in the quantity or structure of an 
organism. Mutation can involve modification of the nucleotide sequence of a single gene, blocks 
of genes or whole chromosome. Changes in single genes may be the consequence of point 
mutations which involve the removal, addition or substitution of a single nucleotide base within 
10 a DNA sequence, or they may be the consequence of changes involving the insertion or deletion 
of large numbers of nucleotides. 

Mutations can arise spontaneously as a result of events such as errors in the fidelity of 
DNA rq)lication or the movement of transposable genetic elements (transposons) within the 

15 genome. They also are induced following exposure to chemical or physical mutagens. Such 
mutation-inducing agents include ionizing radiations, ultraviolet light and a diverse array of 
chemical such as alkylating agents and polycyclic aromatic hydrocarbons all of which are 
capable of interacting either directly or indirectly (generally following some metaboHc 
biotransformations) with nucleic acids. The DNA lesions induced by such environmental agents 

20 may lead to modifications of base sequence when the affected DNA is replicated or repaired and 
thus to a mutation. Mutation also can be site-directed through the use of particular targeting 
methods. 

a. Random Mutagenesis 
25 i) Insertional Mutagenesis 

Insertional mutagenesis is based on the inactivation of a gene via insertion of a known 
DNA fi-agment. Because it involves the insertion of some type of DNA Augment, the mutations 
generated are generally loss-of-function, rather than gain-of-fimction mutations. However, there 
are several examples of insertions generating gain-of-fimction mutations (Oppenheimer et ai 
30 1991). Insertion mutagenesis has been very successful in bacteria and Drosophila (Cooley et al 
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1988) and recently has become a powerful tool in com (Schmidt et al 1987); Arabidopsis; 
(Marks et al, 1991; Koncz et al 1990); and Antirrhinum (Sonmier et al 1990). 



Transposable genetic elements are DNA sequences that can move (transpose) from one 
5 place to another in the genome of a cell. The first transposable elements to be recognized were 
the Activator/Dissociation elements of Zea mays. Since then, they have been identified in a wide 
range of organisms, both prokaryotic and eukaryotic. 

Transposable elements in the genome are characterized by being flanked by direct repeats 
10 of a short sequence of DNA that has been duplicated during transposition and is called a target 
site dupHcation. Virtually all transposable elements whatever their type, and mechanism of 
transposition, make such duplications at the site of their insertion. In some cases the number of 
bases duplicated is constant , in other cases it may vary with each transposition event. Most 
transposable elements have inverted repeat sequences at their termini, these terminal inverted 
15 repeats may be anything from a few bases to a few hundred bases long and in many cases they 
are known to be necessary for transposition. 

Prokaryotic transposable elements have been most studied in E, coli and Gram negative 
bacteria, but also are present in Gram positive bacteria. They are generally termed insertion 
20 sequences if they are less than about 2 kB long, or transposons if they are longer. 
Bacteriophages such as mu and D108, which replicate by transposition, make up a third type of 
transposable element, elements of each type encode at least one polypeptide a transposase, 
required for their own transposition. Transposons often fiirther include genes coding for function 
unrelated to transposition, for example, antibiotic resistance genes. 

.25 

Transposons can be divided into two classes according to their structure. First, 
compound or composite transposons have copies of an insertion sequence element at each end, 
usually in an inverted orientation. These transposons require transposases encoded by one of 
their terminal IS elements. The second class of transposon have terminal repeats of about 30 
30 base pairs and do not contain sequences from IS elements. 
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Transposition usually is either conservative or replicative, although in some cases it can 
be both. In replicative transposition, one copy of the transposing element remains at the donor 
site, and another is inserted at the target site. In conservative transposition, the transposing 
element is excised from one site and inserted at another. 

5 

Eukaryotic elements also can be classified according to their structure and mechanism of 
transportation. The primary distinction is between elements that transpose via an RNA 
intermediate, and elements that transpose directly from DNA to DNA. 

10 Elements that transpose via an RNA intermediate often are referred to as 

retrotransposons, and their most characteristic feature is that they encode polypq)tides that are 
believed to have reverse transcriptionase activity. There are two types of retrotransposon. Some 
resemble the integrated proviral DNA of a retrovirus in that they have long direct repeat 
sequences, long terminal repeats (LTRs), at each end. The similarity between these 

1 5 retrotransposons and proviruses extends to their coding capacity. They contain sequences related 
to the gag and pol genes of a retrovirus, suggesting that they transpose by a mechanism related to 
a retroviral life cycle. Retrotransposons of the second type have no terminal repeats. They also 
code for gag- and polAiko polypeptides and transpose by reverse transcription of RNA 
intermediates, but do so by a mechanism that differs from that or retrovirus-like elements. 

20 Transposition by reverse transcription is a replicative process and does not require excision of an 
element from a donor site. 

Transposable elements are an important source of spontaneous mutations, and have 
influenced the ways in which genes and genomes have evolved. They can inactivate genes by 
25 inserting within them, and can cause gross chromosomal rearrangements either directly, through 
the activity of their transposases, or indirectly, as a result of recombination between copies of an 
element scattered around the genome. Transposable elements that excise often do so imprecisely 
and may produce alleles coding for altered gene products if the number of bases added or deleted 
is a multiple of three. 

30 
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Transposable elements themselves may evolve in unusual ways. If they were inherited 
like other DNA sequences, then copies of an element in one species would be more like copies in 
closely related species than copies in more distant species. This is not always the case, 
suggesting that transposable elements are occasionally transmitted horizontally from one species 
5 to another. 

ii) Chemical mutagenesis 
Chemical mutagenesis offers certain advantages, such as the ability to find a full range of 
mutant alleles with degrees of phenotypic severity, and is facile and inexpensive to perform. The 
10 majority of chemical carcinogens produce mutations in DNA. Ben2o[a]pyrene, N-acetoxy-2- 
acetyl aminofluorene and aflotoxin Bl cause GC to TA transversions in bacteria and mammalian 
cells. Benzo[a]pyrene also can produce base substitutions such as AT to TA. N-nitroso 
compounds produce GC to AT transitions. Alkylation of the 04 position of thymine induced by 
exposure to n-nitrosoureas results in TA to CO transitions. 

15 

A high correlation between mutagenicity and carcinogenity is the underlying assumption 
behind the Ames test (McCann et aL, 1975) which speedily assays for mutants in a bacterial 
system, together with an added rat liver homogenate, which contains the microsomal cytochrome 
P450, to provide the metabolic activation of the mutagens where needed. 

20 

In vertebrates, several carcinogens have been found to produce mutation in the ras proto- 
oncogene. N-nitroso-N-methyl urea induces manmiary, prostate and other carcinomas in rats 
with the majority of the tumors showing a G to A transition at the second position in codon 12 of 
the Ha-ras oncogene. Benzo[a]pyrene-induced skin tumors contain A to T transformation in the 
25 second codon of the Ha-ras gene. 

Hi) Radiation Mutagenesis 
The integrity of biological molecules is degraded by the ionizing radiation. Adsorption 
of the incident energy leads to the formation of ions and free radicals, and breakage of some 
30 covalent bonds. Susceptibility to radiation damage appears quite variable between molecules, 
and between different crystalline forms of the same molecule. It depends on the total 
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accumulated dose, and also on the dose rate (as once free radicals are present, the molecular 
damage they cause depends on their natural difiusion rate and thus upon real time). Damage is 
reduced and controlled by making the sample as cold as possible. 

5 Ionizing radiation causes DNA damage and cell killing, generally proportional to the 

dose rate. Ionizing radiation has been postulated to induce multiple biological effects by direct 
interaction with DNA, or through the formation of free radical species leading to DNA damage. 
These effects include gene mutations, malignant transformation, and cell killing. Although 
ionizing radiation has been demonstrated to induce expression of certain DNA repair genes in 

10 some prokaryotic and lower eukaryotic cells, little is known about the effects of ionizing 
radiation on the regulation of mammalian gene expression (Borek, 1985). Several studies have 
described changes in the pattern of protein synthesis observed after irradiation of mammalian 
cells. For example, ionizing radiation treatment of human malignant melanoma cells is 
associated with induction of several unidentified proteins (Boothman et aLy 1989). Synthesis of 

15 cyclin and co-regulated polypeptides is suppressed by ionizing radiation in rat REF52 cells, but 
not in oncogene-transformed REF52 cell lines (Lambert and Borek, 1988). Other studies have 
demonstrated that certain growth factors or cytokines may be involved in x-ray-induced DNA 
damage. In this regard, platelet-derived growth factor is released from endotheUal cells after 
irradiation (Witte, et al, 1989). 

20 

In the present invention, the term "ionizing radiation" means radiation comprising 
particles or photons that have sufficient energy or can produce sufficient energy via nuclear 
interactions to produce ionization (gain or loss of electrbns). An exemplary and preferred 
ionizing radiation is an x-radiation. The amount of ionizing radiation needed in a given cell 
25 generally depends upon the nature of that cell. Typically, an effective expression-inducing dose 
is less than a dose of ionizing radiation that causes cell damage or death directly. Means for 
determining an effective amoimt of radiation are well known in the art. 

In a certain embodiments, an effective expression inducing amount is from about 2 to 
30 about 30 Gray (Gy) administered at a rate of from about 0.5 to about 2 Gy/minute. Even more 
preferably, an effective expression inducing amount of ionizing radiation is from about 5 to 
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about 15 Gy. In other embodiments, doses of 2-9 Gy are used in single doses. An effective dose 
of ionizing radiation may be from 10 to 100 Gy, with 15 to 75 Gy being preferred, and 20 to 50 
Gy being more preferred. 

5 Any suitable means for delivering radiation to a tissue may be employed in the present 

invention in addition to extemal means. For example, radiation may be delivered by first 
providing a radiolabeled antibody that immxmoreacts with an antigen of the tumor, followed by 
delivering an effective amount of the radiolabeled antibody to the tumor. In addition, 
radioisotopes may be used to deliver ionizing radiation to a tissue or celL 

10 

iy) In Vitro Scanning Mutagenesis 
Random mutagenesis also may be introduced using error prone PGR (Cadwell and Joyce, 
1992). The rate of mutagenesis may be increased by performing PGR in multiple tubes with 
dilutions of templates. 

15 

One particularly useful mutagenesis technique is alanine scanning mutagenesis in which 
a number of residues are substituted individually with the amino acid alanine so that the effects 
of losing side-chain interactions can be determined, while minimizing the risk of large-scale 
perturbations in protein conformation (Cunningham et al^ 1989). 

20 

In recent years, techniques for estimating the equilibrium constant for ligand binding 
using minuscule amounts of protein have been developed (Blackburn et a/., 1991; U.S. Patents 
5,221,605 and 5,238,808). The ability to perform functional assays with small amounts of 
material can be exploited to develop highly efficient, in vitro methodologies for the saturation 

25 mutagenesis of antibodies. The inventors bypassed cloning steps by combining PGR mutagenesis 
with coupled in vitro transcription/translation for the high throughput generation of protein 
mutants. Here, the PGR products are used directly as the template for the in vitro 
transcription/translation of the mutant single chain antibodies. Because of the high efficiency 
with which all 19 amino acid substitutions can be generated and analyzed in this way, it is now 

30 possible to perform saturation mutagenesis on numerous residues of interest, a process that can 
be described as in vitro scanning saturation mutagenesis (Burks et al.^ 1997). 
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In vitro scanning saturation mutagenesis provides a rapid method for obtaining a large 
amount of structure-function information including: (i) identification of residues that modulate 
ligand binding specificity, (ii) a better understanding of ligand binding based on the 
5 identification of those amino acids that retain activity and those that abolish activity at a given 
location, (iii) an evaluation of the overall plasticity of an active site or protein subdomain, (iv) 
identification of amino acid substitutions that result in increased binding. 

v) Random Mutagenesis by Fragmentation and Reassmbly 
10 A method for generating libraries of displayed polypeptides is described in U.S. Patent 

5,380,721. The method comprises obtaining polynucleotide library members, poolmg and 
fi-agmenting the polynucleotides, and reforming fi-agments therefi-om, performing PCR 
amplification, thereby homologously recombining the fragments to fomi a shuflQed pool of 
recombined polynucleotides. 

15 

b. Site-Directed Mutagenesis 
Structure-guided site-specific mutagenesis represents a powerfiil tool for the dissection 
and engineering of protein-Hgand interactions. The technique provides for the preparation and 
testing of sequence variants by introducing one or more nucleotide sequence changes into a 
20 selected DNA. 

Site-specific mutagenesis uses specific oligonucleotide sequences which encode the DNA 
sequence of the desired mutation, as well as a sufScient number of adjacent, unmodified 
nucleotides. lii this way, a primer sequence is provided with sufficient size and complexity to 
25 form a stable duplex on both sides of the deletion junction being traversed. A primer of about 17 
to 25 nucleotides in length is preferred, with about 5 to 10 residues on both sides of the junction 
of the sequence being altered. 

The technique typically employs a bacteriophage vector that exists in both a single- 
30 stranded and double-stranded form. Vectors usefiil in site-directed mutagenesis include vectors 
such as the Ml 3 phage. These phage vectors are conunercially available and their use is 
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generally well known to those skilled in the art. Double-stranded plasmids are also routinely 
employed in site-directed mutagenesis, which eliminates the step of transferring the gene of 
interest from a phage to a plasmid. 



5 In general, one first obtains a single-stranded vector, or melts two strands of a double- 

stranded vector, which includes within its sequence a DNA sequence encoding the desired 
protein or genetic element. An oligonucleotide primer bearing the desired mutated sequence, 
synthetically prepared, is then annealed with the single-stranded DNA preparation, taking into 
account the degree of mismatch when selecting hybridization conditions. The hybridized 

10 product is subjected to DNA polymerizing enzymes such as E. coli polymerase I (Klenow 
fragment) in order to complete the synthesis of the mutation-bearing stnmd. Thus, a 
heteroduplex is formed, wherein one strand encodes the original non-mutated sequence, and the 
second strand bears the desired mutation. This heteroduplex vector is then used to transform 
appropriate host cells, such as E. coli cells, and clones are selected that include recombinant 

1 5 vectors bearing the mutated sequence arrangement. 

Comprehensive information on the ftmctional significance and information content of a 
given residue of protein can best be obtained by saturation mutagenesis in which all 19 amino 
acid substitutions are examined. The shortcoming of this approach is that the logistics of 
20 multiresidue saturation mutagenesis are daunting (Warren et al, 1996, Zeng et aL, 1996;Yelton 
et aLy 1995; Hilton et aLy 1996). Hxmdreds, and possibly even thousands, of site specific mutants 
must be studied. However, improved techniques make production and rapid screening of 
mutants much more straightforward. See also, U.S. Patents 5,798,208 and 5,830,650, for a 
description of '^valk-through" mutagenesis. 

25 

Other methods of site-directed mutagenesis are disclosed in U.S. Patents 5,220,007; 
5,284,760; 5,354,670; 5,366,878; 5,389,514; 5,635,377; and 5,789,166. 
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B. Protein Expression 
L Vectors 

Once the soluble protein, polypeptide or peptide encoding sequence(s) and alpha 
complementing protein, polypeptide or peptide encoding sequence(s) are selected, they may be 
5 operatively expressed in a recombinant vector. The expression may be in vivo or in vitro, to 
assay the refolding and complementation process. The term ^Vector" is used to refer to a carrier 
nucleic acid molecule into which a nucleic acid sequence can be inserted for introduction into a 
cell where it can be replicated. A nucleic acid sequence can be "exogenous," which means that it 
is foreign to the cell into which the vector is being introduced or that the sequence is homologous 

10 to a sequence in the cell but in a position within the host cell nucleic acid in which the sequence 
is ordinarily not found. Vectors include plasmids, cosmids, viruses (bacteriophage, animal 
viruses, and plant viruses), and artificial chromosomes (e.g,, YACs). One of skill in the art 
would be well equipped to construct a vector through standard recombinant techniques, which 
are described in Sambrook et aL, 1989 and Ausubel et al, 1994, both incorporated herein by 

15 reference. 

The term "expression vector" refers to a vector containing a nucleic acid sequence coding 
for at least part of a gene product capable of being transcribed. In some cases, RNA molecules 
are then translated into a protein, polypeptide, or peptide. In other cases, these sequences are not 

20 translated, for example, in the production of antisense molecules or ribozymes. Expression 
vectors can contain a variety of "control sequences," which refer to nucleic acid sequences 
necessary for the transcription and possibly translation of an operably linked coding sequence in 
a particular host organism. In addition to control sequences that govern transcription and 
translation, vectors and expression vectors may contain nucleic acid sequences that serve other 

25 fimctions as well and are described infra. 

a. Promoters and Enhancers 

A "promoter" is a control sequence that is a region of a nucleic acid sequence at which 
initiation and rate of transcription are controlled. It may contain genetic elements at which 
30 regulatory proteins and molecules may bind such as RNA polymerase and other transcription 
factors. The phrases "operatively positioned," "operatively linked," **under control," and **under 
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transcriptional control" mean that a promoter is in a correct functional location and/or orientation 
in relation to a nucleic acid sequence to control transcriptional initiation and/or expression of that 
sequence. A promoter may or may not be used in conjunction with an "enhancer," which refers 
to a cis-acting regulatory sequence involved in the transcriptional activation of a nucleic acid 
S sequence. 

A promoter may be one naturally associated with a gene or sequence, as may be obtained 
by isolating the 5' non-coding sequences located upstream of the coding segment and/or exon. 
Such a promoter can be referred to as "endogenous." Similarly, an enhancer may be one 

10 naturally associated with a nucleic acid sequence, located either downstream or upstream of that 
sequence. Alternatively, certain advantages will be gained by positioning the coding nucleic 
acid segment under the control of a recombinant or heterologous promoter, which refers to a 
promoter that is not normally associated with a nucleic acid sequence in its natural environment. 
A recombinant or heterologous enhancer refers also to an enhancer not normally associated with 

15 a nucleic acid sequence in its natural environment. Such promoters or enhancers may include 
promoters or enhancers of other genes, and promoters or enhancers isolated from any other 
prokaryotic, viral, or eukaryotic cell, and promoters or enhancers not "naturally occurring," le., 
containing different elements of different transcriptional regulatory regions, and/or mutations 
that alter expression. In addition to producing nucleic acid sequences of promoters and 

20 enhancers synthetically, sequences may be produced using recombinant cloning and/or nucleic 
acid amplification technology, including PCR™, in connection with the compositions disclosed 
herein (see U.S. Patent 4,683,202, U.S. Patent 5,928,906, each incorporated herein by reference). 
Furthermore, it is contemplated the control sequences that direct transcription and/or expression 
of sequences within non-nuclear organelles such as mitochondria, chloroplasts, and the like, can 

25 be employed as well. 

Naturally, it will be important to employ a promoter and/or enhancer that eflfectively 
directs the expression of the DNA segment in the cell type, organelle, and organism chosen for 
expression. Those of skill in the art of molecular biology generally know the use of promoters, 
30 enhancers, and cell type combinations for protein expression, for example, see Sambiook et al 
(1989), incorporated herein by reference. The promoters employed may be constitutive, tissue- 
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specific, inducible, and/or useful under the appropriate conditions to direct high level expression 
of the introduced DNA segment, such as is advantageous in the large-scale production of 
recombinant proteins and/or peptides. The promoter may be heterologous or endogenous. 

5 Tables 1 lists several elements/promoters that may be employed, in the context of the 

present invention, to regulate the expression of a gene. This list is not intraded to be exhaustive 
of all the possible elements involved in the promotion of expression but, mCTely, to be exemplary 
thereof Table 2 provides examples of inducible elements, which are regions of a nucleic acid 
sequence that can be activated in response to a specific stimulus. 

10 





TABLE 1 


Promoter and/or Enhancer 


Promoter/Enhancer 


References 


Immunoglobulin Heavy Chain 


Banerji etai, 1983; Gilles etal., 1983; Grosschedl et al, 
1985; Atchinson et al, 1986, 1987; Imler etal, 1987; 
Weinberger etal, 1984; Kiledjian etal, 1988; Porton 
etal; 1990 


Immunoglobulin Light Chain 


Queen et al, 1983; Picard et al, 1984 


T-Cell Receptor 


Luria etal, 1987; Winoto etal, 1989; Redondo etal; 
1990 


HLADQ a and/or DQP 


Sullivan e/ a/., 1987 


p-Interferon 


Goodboum etal, 1986; Fujita etoL, 1987; Goodboum 

e/ a/., 1988 


Interleukin-2 


Greene a/., 1989 


Interleukin-2 Receptor 


Greene et al, 1989; Lin et al, 1990 


MHCClassnS 


Kochc/fl/., 1989 


MHC Class n HLA-Dra 


Sherman a/., 1989 


P-Actin 


Kawamoto et al , 1 988; Ng et al ; 1 989 


Muscle Creatine Kinase (MCK) 


Jaynes etal, 1988; Horlick etal, 1989; Johnson etal, 
1989 


Prealbiunin (Transthyretin) 


Costa a/., 1988 


Elastase I 


Omitze/a/., 1987 


Metallothionein (MTII) 


Karin et al, 1987; Culotta et al, 1989 
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TABLE 1 


Promoter and/or Enhancer 


Promoter/Enhancer 


References 


Collagenase 


Pinkert et al, 1987; Angel et ai, 1987 


Albumin 


Pinkerte/a/., 1987; Tronchee/a/., 1989, 1990 


a-Fetoprotein 


Godbout et al, 1988; Campere et al, 1989 


t-Globin 


Bodine et al, 1987; Perez-Stable etal, 1990 


P-Globin 


Trudel etaL^ 1987 


c-fos 


Cohen etal, 1987 


TT A 

c-HA-ros 


Tnesman, 1986; Deschamps e/ a/., 1985 


Insulin 


Edlunde^a/., 1985 


Neural Cell Adhesion Molecule 
(NCAM) 


Hirshe/a/., 1990 


apAntitrypain 


Latimer a/., 1990 


H2B (TH2B) Histone 


Hwang e/ a/., 1990 


Mouse and/or Type I Collagen 


Ripe era/., 1989 


Glucose-Regulated Proteins 
(GRP94 and GRP78) 


Changed/., 1989 


Rat Growth Hormone 


Larsene^a/., 1986 


Human Serum Amyloid A (SAA) 


Edbrookeefa/., 1989 


Troponin I (TN I) 


Yutzeyefa/., 1989. 


Pljitplpf-T^pri vpH rrrnwtli T^aptnr 


Pech et al 1 989 


(PDGF) 




Duchenne Muscular Dystrophy 


Klamute/a/.,1990 


SV40 


Banerji etal, 1981; Moreau etal, 1981; Sleigh et al. 
1985; Firak e/ al, 1986; Herr a/., 1986; Imbra et al, 
1986; Kadesch et al, 1986; Wang et al, 1986; Ondek 
e/fl/., 1987; Kuhl etal, 1987; Schaflfiierc/a/., 1988 


Polyoma 


Swartzendruber et al, 1975; Vasseiu: et al, 1980; Katinka 
etal, 1980, 1981; Tyndell efa/., 1981; Dandolo etal, 
1983; de Villiers etal, 1984; Hen era/., 1986; Satake 
etal, 1988; Campbell and/or Villarreal, 1988 
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TABLE 1 
Promoter and/or Enhancer 


Promoter/Enhancer 


References 


Retroviruses 


Kriegler et al, 1982, 1983; Levinson a/., 1982; Kriegler 
etal, 1983, 1984a, b, 1988; Bosze eial., 1986; Miksicek 
etal, 1986; Celander etal, 1987; Thiesen etal, 1988; 
Celander etal, 1988; Choi etal, 1988; Reisman etal, 
1989 


Papilloma Virus 


Campo etal, 1983; Lusky etal, 1983; Spandidos and/or 
Wilkie, 1983; Spalholz etal, 1985; Ludcy etal, 1986; 
Cripe etal, 1987; Gloss etal, 1987; Hirochika etal, 
1987; Stephens a/., 1987; Glue a/., 1988 


Hepatitis B Virus 


Bulla etal, 1986; Jameel etal, 1986; Shaul etal, 1987; 
Spandau et al, 1988; Vannice et al, 1988 


Human Immimodeficiency Virus 


Muesing etal, 1987; Hauber etal, 1988; Jakobovits 
etal, 1988; Feng etal, 1988; Takebe etal, 1988; Rosen 
etal, 1988; Berkhout etal, 1989; Laspia etal, 1989; 
Sharp e/ al, 1989; Braddock e/ a/., 1989 


Cytomegalovirus (CMV) 


Weber etal, 1984; Boshart etal, 1985; Foecking c/a/., 
1986 


Gibbon Ape Leukemia Virus 


Holbrook et al, 1987; Quinn et al, 1989 



TABLE 2 
Inducible Elements 


Element 


Inducer 


References 


Mxn 


Phorbol Ester (TFA) 
Heavy metals 


Pahniter etal, 1982; Haslinger 
etal, 1985; Searie etal, 1985; 
Stuart etal, 1985; Imagawa 
etal, 1987, Karin etal, 1987; 
Angel etal, 1987b; McNeall 
etal.,\9%9 


MMTV (mouse mammary 
tumor virus) 


Glucocorticoids 


Huang etal, 1981; Lee etal, 
1981; Majors e/a/., 1983; 
Chandler etal, 1983; Lee etal, 
1984; Ponta e/fl/., 1985; Sakai 
etal, 1988 


P-Literferon 


Poly(rI)x 
Poly(rc) 


Tavemiere/fli, 1983 
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TABLE 2 
Inducible Elements 


Element 


Inducer 


References 


Adenovirus 5 E2 


ElA 


Impenale et ai, 1984 


Collagenase 


Phorbol Ester (TP A) 


Angel ^/ a/., 1987a 


Stromelysin 


Phorbol Ester (TP A) 


Angel et al^ 1987b 


SV40 


Phorbol Ester (TP A) 


A^^^l A. 1 1 />OT1_ 

Angel et at., 1987b 


Munne MX Gene 


inteneron, JNewcastie 
Disease Virus 


xlug et at., li/oo 


GRP78 Gene 


A23187 


Resendeze/a/., 1988 


a-2-Macroglobulin 


IL-6 


Kunzeia/., 1989 


Vimentin 


Serum 


Kittling et al., 1989 


MHC Class I Gene H-2Kb 


Interferon 


Blanar e/ al., 1989 


HSP70 


ELA, SV40 Large T 
Antigen 


Taylor c< a/., 1989, 1990a, 1990D 


Proliferin 


Phorbol Ester-TPA 


Mordacq eM/., 1989 


Tumor Necrosis Factor 


PMA 


H&aseletaL, 1989 


Thyroid Stimulating 
Hormone a Gene 


Thyroid Hormone 


Chatterjeee/fl/., 1989 



The identity of tissue-specific promoters or elements, as well as assays to characterize 
their activity, is well known to those of skill in the art. Examples of such regions include the 
hxmian LIMK2 gene (Nomoto et al 1999), the somatostatin receptor 2 gene (Kraus et al, 1998), 
5 murine epididymal retinoic acid-binding gene (Lareyre et al, 1999), human CD4 (Zhao-Emonet 
et al, 1998), mouse alpha2 (XI) collagen (Tsumaki, et al, 1998), DIA dopamine recq)tor gene 
(Lee, et al, 1997), insulin-like growth factor II (Wu et al, 1997), human platelet endothelial cell 
adhesion molecule- 1 (Almendro et al, 1996). 

10 b. Initiation Signals and Internal Ribosome Binding Sites 

A specific initiation signal also may be required for efficient translation of coding 
sequences. These signals include the ATG initiation codon or adjacent sequences. Exogenous 
translational control signals, including the ATG initiation codon, may need to be provided. One 
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of ordinary skill in the art would readily be capable of determining this and providing the 
necessary signals. It is well known that the initiation codon must be "in-frame" with the reading 
frame of the desired coding sequence to ensure translation of the entire insert. The exogenous 
translational control signals and initiation codons can be either natural or synthetic. The 
5 eflRciency of expression may be enhanced by the inclusion of appropriate transcription enhancer 
elements. 

In certain embodiments of the invention, the use of internal ribosome entry sites (IRES) 
elements are used to create multigene, or polycistronic, messages. IRES elements are able to 

10 bypass the ribosome scarming model of 5' methylated Cap dependent translation and begin 
translation at internal sites (Pelletier and Sonenberg, 1988). IRES elements from two members 
of the picomavirus family (polio and encephalomyocarditis) have been described (Pelletier and 
Sonenberg, 1988), as well an IRES from a mammaUan message (Macejak and Samow, 1991). 
IRES elements can be linked to heterologous open reading fitimes. Multiple open reading 

IS frames can be transcribed together, each separated by an IRES, creating polycistronic messages. 
By virtue of the IRES element, each open reading frame is accessible to ribosomes for efficient 
translation. Multiple genes can be efficiently expressed using a single promoter/enhancer to 
transcribe a single message (see U.S. Patent 5,925,565 and 5,935,819, herein incorporated by 
reference). 

20 

c. Multiple Cloning Sites 
Vectors can include a multiple cloning site (MCS), which is a nucleic acid region that 
contains multiple restriction enzyme sites, any of which can be used in conjunction witfi standard 
recombinant technology to digest the vector. (See Carbonelli et al, 1999, Levenson et al, 1998, 

25 and Cocea, 1997, incorporated herein by reference.) "Restriction enzyme digestion" refers to 
catalytic cleavage of a nucleic acid molecule with an enzyme that functions only at specific 
locations in a nucleic acid molecule. Many of these restriction enzymes are commercially 
available. Use of such enzymes is widely understood by those of skill in the art. Frequently, a 
vector is linearized or Augmented using a restriction enzyme that cuts within the MCS to enable 

30 exogenous sequences to be ligated to the vector. "Ligation" refers to the process of forming 
phosphodiester bonds between two nucleic acid firagments, which may or may not be contiguous 
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with each other. Techniques involving restriction enzymes and ligation reactions are well known 
to those of skill in the art of recombinant technology. 

d. Splicing Sites 

5 Most transcribed eukaryotic RNA molecules will undergo RNA splicing to remove 

introns from the primary transcripts. Vectors containing genomic eukaryotic sequences may 
require donor and/or acceptor splicing sites to ensure proper processing of the transcript for 
protein expression. (See Chandler et aL, 1997, herein incorporated by reference.) 

10 e. Polyadeiiylafion Signals 

In expression, one will typically include a polyadenylation signal to effect proper 
polyadenylation of the transcript. The nature of the polyadenylation signal is not believed to be 
crucial to the successful practice of the invention, and/or any such sequence may be employed. 
Preferred embodiments include the SV40 polyadenylation signal and/or the bovine groAVth 
15 hormone polyadenylation signal, convenient and/or known to function well in various target 
cells. Also contemplated as an element of the expression cassette is a transcriptional termination 
site. These elements can serve to enhance message levels and/or to minimize read through from 
the cassette into other sequences. 

20 f. Origins of Replication 

In order to propagate a vector in a host cell, it may contain one or more origins of 
replication sites (often termed ''ori")> which is a specific nucleic acid sequence at which 
replication is initiated. Altematively an autonomously replicating sequence (ARS) can be 
employed if the host cell is yeast. 

25 

g. Selectable and Screenable Markers 
In certain embodiments of the invention, the cells contain nucleic acid construct of the 
present invention, a cell may be identified in vitro or in vivo by including a marker in the 
expression vector. Such markers would confer an identifiable change to the cell permitting easy 
30 identification of cells containing the expression vector. Generally, a selectable marker is one that 
confers a property that allows for selection. A positive selectable marko* is one in which the 
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presence of the marker allows for its selection, while a negative selectable marker is one in 
which its presence prevents its selection. An example of a positive selectable maiker is a drug 
resistance marker. 

5 Usually the inclusion of a drug selection marker aids in the cloning and identification of 

transformants, for example, genes that confer resistance to neomycin, puromycin, hygromycin, 
DHFR, GPT, zeocin and histidinol are useful selectable markers. In addition to markers 
conferring a phenotype that allows for the discrimination of transformants based on the 
implementation of conditions, other types of markers including screenable markers such as GFP, 

10 whose basis is colorimetric analysis, are also contemplated. Alternatively, screenable enzymes 
such as herpes simplex virus thymidine kinase (tk) or chloramphenicol acetyltransferase (CAT) 
may be utilized. One of skill in the art would also know how to employ immunologic markers, 
possibly in conjunction with FACS analysis. The marker used is not believed to be important, so 
long as it is capable of being expressed simultaneously with the nucleic acid encoding a gene 

1 5 product. Further examples of selectable and screenable markers are well known to one of skill in 
the art. 

2. Host Cells 

As used herein, the terms "cell," "cell line," and "cell culture" may be used 
20 interchangeably. All of these term also include their progeny, which is any and all subsequent 
generations. It is understood that all progeny may not be identical due to deliberate or 
inadvertent mutations. In the context of expressing a heterologous nucleic acid sequence, "host 
cell" refers to a prokaryotic or eukaiyotic cell, and it includes any transformable organisms that 
is capable of replicating a vector and/or expressing a heterologous gene encoded by a vector. A 
25 host cell can, and has been, used as a recipient for vectors. A host cell may be **transfected" or 
"transformed," which refers to a process by which exogenous nucleic acid is transferred or 
introduced into the host cell. A transformed cell includes the primary subject cell and its 
progeny. 

30 Host cells may be derived fi-om prokaryotes or eukaryotes, depending iq)on whether the 

desired result is replication of the vector or expression of part or all of the vector-encoded 
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nucleic acid sequences. Prokaryotes include gram negative or positive cells. Numerous cell 
lines and cultures are available for use as a host cell, and they can be obtained through the 
American Type Culture Collection (ATCC), which is an organization that serves as an archive 
for living cultures and genetic materials (www.atcc.org). An appropriate host can be determined 
5 by one of skill in the art based on the vector backbone and the desired result. A plasmid or 
cosmid, for example, can be introduced into a prokaryote host cell for replication of many 
vectors. Bacterial cells used as host cells for vector replication and/or expression include DH5a, 
JM109, and KC8, as well as a number of commercially available bacterial hosts such as SURE® 
Competent Cells and Solopack™ Gold Cells (Stratagene®, La JoUa). Altematively, bacterial 
10 cells such as E. coli LE392 could be used as host cells for phage viruses. 

Examples of eukaryotic host cells for replication and/or expression of a vector include C. 
elegans, HeLa, NIH3T3, Jurkat, 293, Cos, CHO, Saos, yeast, nematodes, insect cells, and PC12. 
Many host cells from various cell types and organisms are available and would be known to one 
15 of skill in the art. Similarly, a viral vector may be used in conjimction with either a eukaryotic or 
prokaryotic host cell, particularly one that is permissive for replication or expression of the 
vector. 

Some vectors may employ control sequences that allow it to be replicated and/or 
20 expressed in both prokaryotic and eukaryotic cells. One of skill in the art would further 
understand the conditions under which to incubate all of the above described host cells to 
maintain them and to permit replication of a vector. Also understood and known are techniques 
and conditions that would allow large-scale production of vectors, as well as production of the 
nucleic acids encoded by vectors and their cognate polypeptides, proteins, or peptides. 

25 

3. Expression Systems 

Turning to the expression of the proteins of the present invention, once a suitable nucleic 
acid encoding sequence has been obtained, one may proceed to prepare an expression system. 
The engineering of DNA segment(s) for expression in a prokaryotic or eukaryotic system may be 
30 performed by techniques generally known to those of skill in recombinant expression. 
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It is believed that virtually any expression system may be employed in the expression of 
the proteins of the present invention. Prokaryote- and/or eukaryote-based systems can be 
employed for use with the present invention to produce nucleic acid sequences, or their cognate 
polypeptides, proteins and peptides. Many such systems are conmiercially and widely available. 

5 

Both cDNA and genomic sequences are suitable for eukaryotic expression, as the host 
cell will generally process the genomic transcripts to yield functional mRNA for translation into 
protein. Generally speaking, it may be more convenient to employ as the recombinant gene a 
cDNA version of the gene. It is believed that the use of a cDNA version wiU provide advantages 
10 in that the size of the gene will generally be much smaller and more readily employed to 
transfect the targeted cell than will a genomic gene, which will typically be up to an order of 
magnitude or more larger than the cDNA gene. However, it is contemplated that a genomic 
version of a particular gene may be employed where desired. 

15 It is contemplated that proteins, polypeptides or peptides may be co-e^ressed with other 

selected proteins, polypeptides or peptides, wherein the proteins may be co-expressed in the 
same cell or gene(s) may be provided to a cell that ahready has anoth^ selected protein. 
Co-expression may be achieved by co-transfecting the cell with two distinct recombinant 
vectors, each bearing a copy of either of the respective DNA. Alternatively, a single 

20 recombinant vector may be constructed to include the coding regions for both of the proteins, 
which could then be expressed in cells transfected with the single vector. In either event, the 
term "co-expression" herein refers to the expression of both at least one selected nucleic acid or 
gene encoding one or more proteins, polypeptides or peptides and at least a second selected 
nucleic acid or gene encoding at least one or more secondary selected proteins, polypeptides or 

25 peptides in the same recombinant cell. 

It is contemplated that proteins may be expressed in cell systems or grown in media that 
enhance protein production. One such system is described in U.S. Patent 5,834,249, 
incorporated herein by reference. In certain embodiments, the fusion protein may be co- 
30 expressed with one or more proteins that enhance refolding. Such proteins that enhance 
refolding include, for example, DsbA or DsbC proteins. A cell system co-e}q>ressing the DsbA 
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or DsbC proteins are described in U.S. Patent 5,639,635, incorporated herein by reference. In 
certain embodiments, it is contemplated that a temperature sensitive expression vector may be 
used to aid assaying protein folding at lower or higher temperatures than many E. coli cell 
strain's optimum growth at about 37°C. For example, a temperature sensitive expression vectors 
5 and host cells that express proteins at or below 2(fC is described in U.S. Patents 5,654,169 and 
5,726,039, each incorporated herein by reference. 

As used herein, the terms "engineered" and "recombinant" cells or host cells are intended 
to refer to a cell into which an exogenous DNA segment or gene, such as a cDNA or gene 

10 encoding at least one protein, polypeptide or peptide has been introduced. Therefore, engineered 
cells are distinguishable from naturally occurring cells which do not contain a recombinantly 
introduced exogenous DNA segment or gene. Engineered cells are thus cells having a gene or 
genes introduced through the hand of man. Recombinant cells include those having an 
introduced cDNA or genomic gene, and also include genes positioned adjacent to a promoter not 

1 5 naturally associated with the particular introduced gene. 

Certain examples of prokaryotic hosts are E. coli strain RRl, E. coli LE392, E. coli B, 
E. coli X 1776 (ATCC No. 31537) as well as E. coli W31 10 (F-, lambda-, prototrophic, ATCC 
No. 273325); bacilli such as Bacillus subtilis; and other enterobacteriaceae such as Salmonella 
20 typhimurium, Serratia marcescenSy and various Pseudomonas species. 

In general, plasmid vectors containing replicon and control sequences which are derived 
from species compatible with the host cell are used in connection with these hosts. The vector 
ordinarily carries a replication site, as well as marking sequences which are capable of providing 

25 phenotypic selection in transformed cells. For example, E. coli is often transformed using 
derivatives of pBR322, a plasmid derived from an E, coli species. pBR322 contains genes for 
ampicillin and tetracycline resistance and thus provides easy means for identifying transformed 
cells. The pBR plasmid, or other microbial plasmid or phage must also contain, or be modified 
to contain, promoters which can be used by the microbial organism for expression of its own 

30 proteins. 



1657123.1 



In addition, phage vectors containing replicon and control sequences that are compatible 
with the host microorganism can be used as transforming vectors in connection with these hosts. 
For example, the phage lambda GEM™- 11 may be utilized in making a recombinant phage 
vector which can be used to transform host cells, such as E. coli LE392, 

5 

Further useful vectors include pIN vectors (Inouye etal\ 1985); and pGEX vectors, for 
use in generating glutathione S-transferase (GST) soluble fusion proteins for later purification 
and separation or cleavage. Other suitable fusion proteins are those with P-galactosidase, 
ubiquitin, and the like. 

10 

Promoters that are most commonly used in recombinant DNA construction include the 
p-lactamase (penicillinase), lactose and tryptophan (trp) promoter systems. While these are the 
most conunonly used, other microbial promoters have been discovered and utilized, and details 
concerning their nucleotide sequences have been published, enabling those of skill in the art to 
1 5 ligate them functionally with plasmid vectors. 

a. Prokaryotic Expression 

The following details conceming recombinant protein production in bacterial cells, such 
as E, coliy are provided by way of exemplary information on recombinant protein production in 
20 general, the adaptation of which to a particular recombinant expression system will be known to 
those of skill in the art. 

Bacterial cells, for example, E. colU containing the expression vector are grown in any of 
a number of suitable media, for example, LB. The expression of the recombinant protein may be 
25 induced, e.g., by adding IPTG to the media or by switching incubation to a higher temperature. 
After culturing the bacteria for a further period, generally of between 2 and 24 hours, the cells 
are collected by centrifugation and washed to remove residual media. 

The bacterial cells are then lysed, for example, by disruption in a cell homogenizer and 
30 centrifuged to separate the dense inclusion bodies and cell membranes fix)m the soluble cell 
components. This centrifugation can be performed under conditions whereby the dense 
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inclusion bodies are selectively enriched by incorporation of sugars, such as sucrose, into the 
buffer and centrifugation at a selective speed. 

If the recombinant protein is expressed in the inclusion bodies, as is the case in many 
5 instances, these can be washed in any of several solutions to remove some of the contaminating 
host proteins, then solubilized in solutions containing high concentrations of urea (e.g. 8M) or 
chaotropic agents such as guanidine hydrochloride in the presence of reducing agents, such as 
p-mercaptoethanol or DTT (dithiothreitol). 

10 Under some circumstances, it may be advantageous to incubate the protein for several 

hours under conditions suitable for the protein to undergo a refolding process into a 
conformation which more closely resembles that of the native protein. Such conditions generally 
include low protein concentrations, less than 500 mg/ml, low levels of reducing agent, 
concentrations of urea less than 2 M and often the presence of reagents such as a mixture of 

15 reduced and oxidized glutathione which facilitate the interchange of disulfide bonds within the 
protein molecule. 

The refolding process can be monitored, for example, by SDS-PAGE, or with antibodies 
specific for the native molecule (which can be obtained fi-om animals vaccinated with the native 
20 molecule or smaller quantities of recombinant protein). Following refolding, the protein can 
then be purified further and separated firom the refolding mixture by chromatogr^hy on any of 
several supports including ion exchange resins, gel permeation resins or on a variety of affinity 
columns. 
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b. Eukaryotic Expression 
In addition to micro-organisms, cultures of cells derived from multicellular organisms 
may also be used as hosts. In principle, any such cell culture is workable, whether from 
vertebrate or invertebrate culture. In addition to mammalian cells, these include insect cell 
5 systems infected with recombinant virus expression vectors (eg,, baculovirus); and plant cell 
systems infected with recombinant virus expression vectors (e.g., cauliflower mosaic virus, 
CaMV; tobacco mosaic virus, TMV) or transformed with recombinant plasmid expression 
vectors Ti plasmid) containing one or more protein, polypeptide or peptide coding 

sequences. 

10 ^ 

For expression in Saccharomyces, the plasmid YRp7, for example, is conraionly used. 
This plasmid already contains the trpl gene which provides a selection marker for a mutant strain 
of yeast lacking the ability to grow in tryptophan, for example ATCC No. 44076 or PEP4-1. The 
presence of the trpl lesion as a characteristic of the yeast host cell genome then provides an 
15 effective environment for detecting transformation by growth in the absence of tryptophan. 

Suitable promoting sequences in yeast vectors include the promoters for 
3-phosphoglycerate kinase or other glycolytic enzymes, such as enolase, glyceraldehyde-3- 
phosphate dehydrogenase, hexokinase, pyruvate decarboxylase, phosphofinctokinase, glucose-6- 
20 phosphate isomerase, 3-phosphogIycerate mutase, pyruvate kinase, triosephosphate isomerase, 
phosphoglucose isomerase, and glucokinase. In constructing suitable expression plasmids, the 
termination sequences associated with these genes are also ligated into the expression vector 3' 
of the sequence desired to be expressed to provide polyadenylation of the mRNA and 
termination. 

25 

Other suitable promoters, which have the additional advantage of transcription controlled 
by growth conditions, include the promoter region for alcohol dehydrogenase 2, isocytochrome 
C, acid phosphatase, degradative enzymes associated with nitrogen metabohsm, and the 
aforementioned glyceraldehyde-3-phosphate dehydrogenase, and enzymes responsible for 
30 maltose and galactose utilization. 
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The insect cell/baculovirus system can produce a high level of protein expression of a 
heterologous nucleic acid segment, such as described in U.S. Patent No. 5,871,986, 4,879,236, 
both herein incorporated by reference, and which can be bought, for example, under the name 
MaxBac® 2.0 from Invitrogen® and BacPack™ Baculovirus Expression System From 
5 Clontech®. 

In a useful insect system. Autograph califomica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. The 
protein, polypeptide or peptide coding sequences are cloned into non-essential regions (for 

10 example the polyhedrin gene) of the virus and placed under control of an AcNPV promoter (for 
example the polyhedrin promoter). Successful insertion of the coding sequences results in the 
inactivation of the polyhedrin gene and production of non-occluded recombinant virus (/.e., virus 
lacking the proteinaceous coat coded for by the polyhedrin gene). These recombinant viruses are 
then used to infect Spodoptera frugiperda cells in which the inserted gene is expressed (e.g., U.S. 

1 5 Patent No. 4,2 1 5,05 1 , Smith, incorporated herein by reference). 

Other examples of expression systems include Stratagene®'s Complete Control™ 
Inducible Mammalian Expression System, which involves a synthetic ecdysone-inducible 
receptor, or its pET Expression System, an E, coli expression system. Another example of an 

20 inducible expression system is available from INVITROGEN®, which carries the T-Rex™ 
(tetracycline-regulated expression) System, an inducible mammalian expression system that uses 
the full-length CMV promoter. INVITROGEN® also provides a yeast expression system called the 
Pichia methanolica Expression System, which is designed for high-level production of 
recombinant proteins in the methylotrophic yeast Pichia methanolica. One of skill in the art 

25 would know how to express a vector, such as an expression construct, to produce a nucleic acid 
sequence or its cognate polypeptide, protein, or peptide. 

Examples of useful mammalian host cell lines are VERO and HeLa cells, Chinese 
hamster ovary (CHO) cell lines, W138, BHK, COS-7, 293, HepG2, 3T3, RIN and MDCK cell 
30 lines. In addition, a host cell strain may be chosen that modulates the expression of the inserted 
sequences, or modifies and processes the gene product in the specific fashion desired. Such 
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modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products may be 
important for the function of the protein. 

Different host cells have characteristic and specific mechanisms for the post-translational 
5 processing and modification of proteins. Appropriate cells lines or host systems can be chosen to 
ensure the correct modification and processing of the foreign protein expressed. 

Expression vectors for use in mammalian cells ordinarily include an origin of rephcation 
(as necessary), a promoter located in fix)nt of the gene to be expressed, along with any necessary 

10 ribosome binding sites, RNA splice sites, polyadenylation site, and transcriptional terminator 
sequences. The origin of replication may be provided either by construction of the vector to 
include an exogenous origin, such as may be derived fi^om SV40 or other viral (e.g.. Polyoma, 
Adeno, VSV, BPV) source, or may be provided by the host cell chromosomal replication 
mechanism. If the vector is integrated into the host cell chromosome, the latter is often 

15 sufficient. 

The promoters may be derived firom the genome of mammalian cells (e.g., 
metallothionein promoter) or fix)m mammalian viruses (e.g., the adenovirus late promoter; the 
vaccinia virus 7.5K promoter). Further, it is also possible, and may be desirable, to utilize 
20 promoter or control sequences normally associated with the gene sequence(s), provided such 
control sequences are compatible with the host cell systems. 

A number of viral based expression systems may be utilized, for example, conunonly 
used promoters are derived firom polyoma. Adenovirus 2, and most firequently Simian Virus 40 
25 (SV40). The early and late promoters of SV40 virus are particularly usefiil because both are 
obtained easily fi^om the virus as a fi:agment which also contains the SV40 viral origin of 
replication. Smaller or larger SV40 firagments may also be used, provided there is included the 
approximately 250 bp sequence extending firom the i/mdin site toward the Bgl\ site located in 
the viral origin of replication. 

30 
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In cases where an adenovirus is used as an expression vector, the coding sequences may 
be ligated to an adenovirus transcription/ translation control complex, the late promoter and 
tripartite leader sequence. This chimeric gene may then be inserted in the adenovirus genome by 
in vitro or in vivo recombination. Insertion in a non-essential region of the viral genome (e.g., 
5 region El, E3, or E4) will result in a recombinant virus that is viable and capable of expressing 
proteins, polypeptides or peptides in infected hosts. 

Specific initiation signals may also be required for efficient translation of protein, 
polypeptide or peptide coding sequences. These signals include the ATG initiation codon and 

10 adjacent sequences. Exogenous translational control signals, including the ATG initiation codon, 
may additionally need to be provided. One of ordinary skill in the art would readily be capable 
of determining this and providing the necessary signals. It is well known that the initiation 
codon must be in-fi-ame (or in-phase) with the reading fi-ame of the desired coding sequence to 
ensure translation of the entire insert. These exogenous translational control signals and 

15 initiation codons can be of a variety of origins, both natural and synthetic. The efficiency of 
expression may be enhanced by the inclusion of appropriate transcription enhancer elements and 
transcription terminators. 

In eukaryotic expression, one will also typically desire to incorporate into the 
20 transcriptional unit an appropriate polyadenylation site (e.g., 5 -AATAAA-3*) if one was not 
contained within the original cloned segment. Typically, the poly A addition site is placed about 
30 to 2000 nucleotides "downstream" of the termination site of the protein at a position prior to 
transcription termination. 
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C. Gene Delivery 

The general approach to the aspects of the present invention is to provide a cell with nucleic 
acid encoding a jRision protein, polypeptide or peptide and/or a nucleic acid encoding a protein, 
polypeptide or peptide whose activity may be altered by complementation with the fiision protein, 
5 thereby permitting a detectable change in the activity of the proteins to take eflFect While it is 
conceivable that the protein(s) may be delivered directly, a preferred embodiment involves 
providing a nucleic acid encodmg the protein(s), polypeptide(s) or peptide(s) to the cell. Following 
this provision, the polypeptide(s) are synthesized by the transcriptional and translational machinery 
of the cell, as well as any that may be provided by the expression construct 

10 

In certain embodiments of the invention, the nucleic acid encoding the gene may be 
stably integrated into the genome of the cell. In yet further embodiments, the nucleic acid may 
be stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid 
segments or "episomes" encode sequences sufficient to permit maintenance and replication 
15 independent of or in synchronization with the host cell cycle. How the expression construct is 
delivered to a cell and where in the cell the nucleic acid remains is dependent on the type of 
expression construct employed. 

1. DNA Delivery Using Viral Vectors 
20 The ability of certain viruses to infect cells or enter cells via receptor-mediated 

endocytosis, and to integrate into host cell genome and express viral genes stably and efficiently 
have made them attractive candidates for the transfer of foreign genes into mammalian cells. 
Preferred vectors of the present invention will generally be viral vectors. 

25 Although some viruses that can accept foreign genetic material are limited in the niunber 

of nucleotides they can accommodate and in the range of cells they infect, these viruses have 
been demonstrated to successfully effect gene expression. However, adenoviruses do not 
integrate their genetic material into the host genome and therefore do not require host replication 
for gene expression, making them ideally suited for rapid, efficient, het^logous gene 

30 expression. Techniques for preparing replication-defective infective viruses are well known in 
the art. 
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Of course, in using viral delivery systems, one will desire to purify the virion sufficiently 
to render it essentially free of undesirable contaminants, such as defective interfering viral 
particles or endotoxins and other pyrogens such that it will not cause any untoward reactions in 
5 the cell, animal or individual receiving the vector construct. A preferred means of purifying the 
vector involves the use of buoyant density gradients, such as cesium chloride gradient 
centrifiigation. 

a. Adenoviral Vectors 

10 A particular method for delivery of the expression constructs involves the use of an 

adenovirus expression vector. Although adenovirus vectors are known to have a low capacity 
for integration into genomic DNA, this feature is counterbalanced by the high efficiency of gene 
transfer afforded by these vectors. "Adenovirus expression vector" is meant to include those 
constructs containing adenovirus sequences sufficient to (a) support packaging of the construct 

15 and (b) to ultimately express a tissue or cell-specific construct that has been cloned therein. 

The expression vector comprises a genetically engineered form of adenovirus. 
Knowledge of the genetic organization or adenovirus, a 36 kb, linear, double-stranded DNA 
virus, allows substitution of large pieces of adenoviral DNA with foreign sequences up to 7 kb 
20 (Grunhaus and Horwitz, 1992). In contrast to retrovirus, the adenoviral infection of host cells 
does not result in chromosomal integration becaxise adenoviral DNA can replicate in an episomal 
manner without potential genotoxicity. Also, adenoviruses are structurally stable, and no 
genome rearrangement has been detected after extensive amplification. 

25 Adenovims is particularly suitable for use as a gene transfer vector because of its mid- 

sized genome, ease of manipulation, high titer, wide target-cell range and high infectivity. Both 
ends of the viral genome contain 100-200 base pair inverted repeats (ITRs), which are cis 
elements necessary for viral DNA replication and packaging. The early (E) and late (L) regions 
of the genome contain different transcription units that are divided by the onset of viral DNA 

30 replication. The El region (El A and ElB) encodes protems responsible for the regulation of 
transcription of the viral genome and a few cellular genes. The expression of the E2 region (E2A 
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and E2B) results in the synthesis of the proteins for viral DNA replication. These proteins are 
involved in DNA replication, late gene expression and host cell shut-off (Renan, 1990). The 
products of the late genes, including the majority of the viral capsid proteins, are expressed only 
after significant processing of a single primary transcript issued by the major late promoter 
5 (MLP). The MLP (located at 16.8 m.u.) is particularly efficient during the late phase of 
infection, and all the mRNA's issued fi-om this promoter possess a 5*-tripartite leader (TPL) 
sequence which makes them preferred mRNA's for translation. 

In a current system, recombinant adenovirus is generated from homologous 
10 recombination between shuttle vector and provirus vector. Due to the possible recombination 

between two proviral vectors, wild-type adenovirus may be generated from this process. 
Therefore, it is critical to isolate a single clone of virus from an individual plaque and examine 
its genomic structure. 

15 Creneration and propagation of the current adenovirus vectors, which are replication 

deficient, depend on a imique helper cell line, designated 293, which was transformed fix>m 
himian embryonic kidney cells by Ad5 DNA Augments and constitutively expresses El proteins 
(El A and ElB; Graham etal,, 1977). Since the E3 region is dispensable from the adenovirus 
genome (Jones and Shenk, 1978), the current adenovirus vectors, with the help of 293 cells, 

20 carry foreign DNA in either the El, the D3 or both regions (Graham and Prevec, 1991). 
Recently, adenoviral vectors comprising deletions in the E4 region have been described (U.S; 
Patent 5,670,488, incorporated herein by reference). 

In nature, adenovirus can package approximately 105% of the wild-type genome (Ghosh- 
25 Choudhury et al, 1987), providing capacity for about 2 extra kb of DNA. Combined with the 
approximately 5.5 kb of DNA that is replaceable in the El and E3 regions, the maximum 
capacity of the current adenovirus vector is xmder 7.5 kb, or about 15% of the total length of the 
vector. More than 80% of the adenovirus viral genome remains in the vector backbone. 

30 Helper cell lines may be derived from human cells such as human embryonic kidney 

cells, muscle cells, hematopoietic cells or other human embryonic mesenchymal or epithelial 
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cells. Alternatively, the helper cells may be derived from the cells of other mammalian species 
that are permissive for human adenovirus. Such cells include, e.g., Vero cells or other monkey 
embryonic mesenchymal or epithelial cells. As stated above, the preferred helper cell line is 293. 

5 Racher et al (1995) disclosed improved methods for culturing 293 cells and propagating 

adenovirus. In one format, natural cell aggregates are grown by inoculating individual cells into 
1 liter siliconized spinner flasks (Techne, Cambridge, UK) containing 100-200 ml of medium. 
Following stirring at 40 rpm, the cell viability is estimated with trypan blue. In another format, 
Fibra-Cel microcarriers (Bibby Sterlin, Stone, UK) (5 g/1) is employed as follows. A cell 

10 inoculum, resuspended in 5 ml of medium, is added to the carrier (50 ml) in a 250 ml Erlenmeyer 
flask and left stationary, with occasional agitation, for 1 to 4 h. The medium is then replaced 
with 50 ml of fresh medium and shaking initiated. For virus production, cells are allowed to 
grow to about 80% confluence, after which time the medium is replaced (to 25% of the final 
volume) and adenovirus added at an MOI of 0.05. Cultures are left stationary overnight, 

15 following which the volume is increased to 100% and shaking commenced for another 72 h. 

Other than the requirement that the adenovirus vector be replication defective, or at least 
conditionally defective, the nature of the adenovirus vector is not believed to be crucial to the 
successfiil practice of the invention. The adenovirus may be of any of the 42 different known 
20 serotypes or subgroups A-F. Adenovirus type 5 of subgroup C is the preferred starting material 
in order to obtain the conditional replication-defective adenovirus vector for use in the present 
invention. This is because Adenovirus type 5 is a human adenovirus about which a great deal of 
biochemical and genetic information is known, and it has historically been used for most 
constructions employing adenovirus as a vector. 

25 

As stated above, the typical vector according to the present invention is replication 
defective and will not have an adenovirus El region. Thus, it will be most convenient to 
introduce the transforming construct at the position from which the El-coding sequences have 
been removed. However, the position of insertion of the construct within the adenovirus 
30 sequences is not critical to the invention. The polynucleotide encoding the gene of interest may 
also be inserted in lieu of the deleted E3 region in E3 replacement vectors as described by 
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Karlsson etaL (1986) or in the E4 region where a helper cell line or helper virus complements 
the E4 defect. . 

Adenovirus growth and manipulation is known to those of skill in the art, and exhibits 
5 broad host range in vitro and in vivo. This group of viruses can be obtained in high titers, e,g,, 
10^ to 10^^ plaque-forming units per ml, and they are highly infective. The life cycle of 
adenovirus does not require integration into the host cell genome. The foreign genes delivered 
by adenovirus vectors are episomal and, therefore, have low genotoxicity to host cells. 

10 Adenovirus vectors have been used in eukaryotic gene expression (Levrero etaL, 1991; 

Gomez-Foix etaLy 1992) and vaccine development (Grunhaus and Horwitz, 1992; Graham and 
Prevec, 1992). Recombinant adenovirus and adeno-associated virus (see below) can both infect 
and transduce non-dividing human primary cells. 

15 b. AAV Vectors 

Adeno-associated virus (AAV) is an attractive vector system for use in the cell 
transduction of the present invention as it has a high frequency of integration and it can infect 
nondividing cells, thus making it useful for delivery of genes into mammalian cells, for example, 
in tissue culture (Muzyczka, 1992) or in vivo. AAV has a broad host range for infectivity 
20 (Tratschin et al, 1984; Laughlin et al, 1986; Lebkowski et al, 1988; McLaughlin et al, 1988). 
Details concerning the generation and use of rAAV vectors are described in U.S. Patent No. 
5,139,941 and U.S. Patent No. 4,797,368, each incorporated herein by reference. 

Studies demonstrating the use of AAV in gene delivery include LaFace etal (1988); 

25 Zhou etal (1993); Flotte etal (1993); and Walsh etal (1994). Recombinant AAV vectors 
have been used successfully for in vitro and in vivo transduction of marker genes (Kaplitt et al, 
1994; Lebkowski etal, 1988; Samulski etal, 1989; Yoder etal, 1994; Zhou etal, 1994; 
Hermonat and Muzyczka, 1984; Tratschin etal, 1985; McLaughlin etal, 1988) and genes 
involved in human diseases (Flotte etal, 1992; Luo etal, 1994; Ohi etal, 1990; Walsh etal, 

30 1994; Wei et al, 1994). Recently, an AAV vector has been approved for phase I human trials 
for the treatment of cystic fibrosis. 
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AAV is a dependent parvoviras in that it requires coinfection with another virus (either 
adenovirus or a member of the herpes virus family) to undergo a productive infection in cultured 
cells (Muzyczka, 1992). In the absence of coinfection with helper virus, the wild type AAV 
5 genome integrates through its ends into human chromosome 19 where it resides in a latent state 
as a provirus (Kotin etal, 1990; Samulski etal, 1991). rAAV, however, is not restricted to 
chromosome 19 for integration unless the AAV Rep protein is also expressed (Shelling and 
Smith, 1994). When a cell carrying an AAV provirus is superinfected with a helper virus, the 
AAV genome is "rescued" from the chromosome or from a recombinant plasmid, and a normal 
10 productive infection is estabUshed (Samulski et al, 1989; McLaughlin et al, 1988; Kotin et al, 
1990; Muzyczka, 1992). 

Typically, recombinant AAV (rAAV) virus is made by cotransfecting a plasmid 
containing the gene of interest flanked by the two AAV terminal repeats (McLaughUn etal, 

15 1988; Samulski etal^ 1989; each incorporated herein by reference) and an expression plasmid 
containing the wild type AAV coding sequences without the terminal rq)eats, for example 
pIM45 (McCarty etal, 1991; incorporated herein by reference). The cells are also infected or 
transfected with adenovirus or plasmids carrying the adenovirus genes required for AAV helper 
function. rAAV virus stocks made in such fashion are contaminated with adenovirus which must 

20 be physically separated from the rAAV particles (for example, by cesium chloride density 
centrifugation). Alternatively, adenovirus vectors containing the AAV coding regions or cell 
lines containing the AAV coding regions and some or all of the adenovirus helper genes could be 
used (Yang et al, 1994; Clark et al, 1995). Cell lines carrying the rAAV DNA as an integrated 
provirus can also be used (Flotte et al^ 1995). 

25 

c. Retroviral Vectors 
Retrovuiises have promise as grae delivery vectors due to their ability to integrate their 
genes into the host genome, transferring a large amount of foreign genetic material, infecting a 
broad spectrum of species and cell types and of being packaged in special cell-lines (Miller, 
30 1992). 
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The retroviruses are a group of single-stranded RNA viruses characterized by an ability 
to convert their RNA to double-stranded DNA in infected cells by a process of reverse- 
transcription (Coffin, 1990). The resulting DNA then stably integrates into cellular 
chromosomes as a provirus and directs synthesis of viral proteins. The integration results in the 
5 retention of the viral gene sequences in the recipient cell and its descendants. The retroviral 
genome contains three genes, gag, pol, and env that code for capsid proteins, polymerase 
enzyme, and envelope components, respectively. A sequence found upstream from the gag gene 
contains a signal for packaging of the genome into virions. Two long teiminal repeat (LTR) 
sequences are present at the 5' and 3* ends of the viral genome. These contain strong promoter 
10 and enhancer sequences and are also required for integration in the host cell gaiome (Coffin, 
1990). 

In order to construct a retroviral vector, a nucleic acid encoding a gene of interest is 
inserted into the viral genome in the place of certain viral sequences to produce a virus that is 

15 replication-defective. In order to produce virions, a packaging cell line containing the gag, pol, 
and env genes but without the LTR and packaging components is constructed (Mann etal^ 
1983). When a recombinant plasmid containing a cDNA, together with the retroviral LTR and 
packaging sequences is introduced into this cell line (by calcium phosphate precipitation for 
example), the packaging sequence allows the RNA transcript of the recombinant plasmid to be 

20 packaged into viral particles, which are then secreted into the culture media (Nicolas and 
Rubenstein, 1988; Temin, 1986; Mann etal, 1983). The media containing the recombinant 
retroviruses is then collected, optionally concentrated, and used for gene transfer. Retroviral 
vectors are able to infect a broad variety of cell types. However, integration and stable 
expression require the division of host cells (Paskind et aL, 1975). 

25 

Concern with the use of defective retrovirus vectors is the potential q>pearance of wild- 
type replication-competent vims in the packaging cells. This can result from recombination 
events in which the intact sequence from the recombinant vims inserts upstream Scorn the gag, 
pol, env sequence integrated in the host cell genome. However, new packaging cell lines are 
30 now available that should greatly decrease the likelihood of recombination (Markowitz etaL^ 
1988; Hersdorffer et al, 1990). 
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Gene delivery using second generation retroviral vectors has been reported. Kasahara 
etal, (1994) prepared an engineered variant of the Moloney murine leukemia virus, that 
normally infects only mouse cells, and modified an envelope protein so that the virus specifically 
5 bound to, and infected, human cells bearing the erythropoietin (EPO) receptor. This was 
achieved by inserting a portion of the EPO sequence into an envelope protein to create a 
chimeric protein with a new binding specificity. 

d. Other Viral Vectors 

10 Other viral vectors may be employed as expression constructs in the present invention. 

Vectors derived firom viruses such as vaccinia virus (Ridgeway, 1988; Baichwal and Sugden, 
1986; Coupar etal., 1988), sindbis virus, cytomegalovirus and herpes simplex virus may be 
employed. They offer several attractive features for various mammalian cells (Friedmann, 1989; 
Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et al, 1988; Norwich et al, 1990). 

15 

With the recent recognition of defective hepatitis B viruses, new insight was gained into 
the structiure-function relationship of different viral sequences. In vitro studies showed that the 
vims could retain the abihty for helper-dependent packaging and reverse transcription despite the 
deletion of up to 80% of its genome (Horwich et aL, 1990). This suggested that large portions of 

20 the genome could be replaced with foreign genetic material. Chang et al recently introduced the 
chloramphenicol acetyltransferase (CAT) gene into duck hepatitis B virus genome in the place of 
the polymerase, surface, and pre*surface coding sequences. It was cotransfected with wild-type 
viras into an avian hepatoma cell line. Culture media containing high titers of the recombinant 
virus were used to infect primary duckling hepatocytes. Stable CAT gene expression was 

25 detected for at least 24 days after transfection (Chang et al^ 1991). 

In certain fiirther embodiments, the vector will be HSV. A factor that makes HSV an 
attractive vector is the size and organization of the genome. Because HSV is large, incorporation 
of multiple genes or expression cassettes is less problematic than in other smaller viral systons. 
30 In addition, the availability of different viral control sequences with varying performance 
(temporal, strength, etc) makes it possible to control expression to a greater extent than in other 
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systems. It also is an advantage that the virus has relatively few spliced messages, further easing 
genetic manipulations. HSV also is relatively easy to manipulate and can be grown to high titers. 
Thus, delivery is less of a problem, both in terms of volumes needed to attain suflBcient MOI and 
in a lessened need for repeat dosings. 

5 

e. Modified Viruses 
In still further embodiments of the present invention, the nucleic acids to be delivered are 
housed within an infective virus that has been engineered to express a specific binding ligand. 
The virus particle will thus bind specifically to the cognate receptors of the target cell and deUver 
10 the contents to the cell. A novel approach designed to allow specific targeting of retrovirus 
vectors was recently developed based on the chemical modification of a retrovirus by the 
chemical addition of lactose residues to the viral envelope. This modification can permit the 
specific infection of hepatocytes via sialoglycoprotein receptors. 

Another approach to targeting of recombinant retroviruses was designed in which 
biotinylated antibodies against a retroviral envelope protein and against a specific cell receptor 
were used. The antibodies were coupled via the biotin components by using streptavidin (Roux 
etaL, 1989). Using antibodies against major histocompatibility complex class I and class II 
antigens, they demonstrated the infection of a variety of human cells that bore those siuface 
antigens with an ecotropic virus in vitro (Roux et aL, 1989). 

2. Other Methods of DNA Delivery 

In various embodiments of the invention, DNA is delivered to a cell as an expression 
construct. In order to effect expression of a gene construct, the expression construct must be 
25 delivered into a cell. As described herein, the preferred mechanism for delivery is via viral 
infection, where the expression construct is enc^sidated in an infectious viral particle. 
However, several non-viral methods for the transfer of expression constructs into cells also are 
contemplated by the present invention. In one embodiment of the preset invention, the 
expression construct may consist only of naked recombinant DNA or plasmids. Transfer of the 
30 construct may be performed by any of the methods mentioned which physically or chemically 
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permeabilize the cell membrane. Some of these techniques may be successfully adapted for in vivo 
or ex vivo use, as discussed below. 

a. Liposome-Mediated Transfection 

5 In a further embodiment of the invention, the expression construct may be entrapped in a 

liposome. Liposomes are vesicular structures characterized by a phospholipid bilayer membrane 
and an inner aqueous medium. Multilamellar liposomes have multiple lipid layers separated by 
aqueous mediimi. They form spontaneously when phospholipids are suspended in an excess of 
aqueous solution. The lipid components undergo self-rearrangement before the formation of 
10 closed structures and entrap water and dissolved solutes between the lipid bilayers (Ghosh and 
Bachhawat, 1991). Also contemplated is an expression construct complexed with Lipofectamine 
(GibcoBRL). 

Liposome-mediated nucleic acid delivery and expression of foreign DNA in vitro has 
15 been very successful (Nicolau and Sene, 1982; Fraley et al^ 1979; Nicolau et al^ 1987). Wong 
etal (1980) demonstrated the feasibility of liposome-mediated delivery and expression of 
foreign DNA in cultured chick embryo, HeLa and hepatoma cells. 

In certain embodiments of the invention, the liposome may be complexed with a 
20 hemagglutinating virus (HVJ). This has been shown to facilitate fusion with the cell membrane 
and promote cell entry of liposome-encapsulated DNA (Kaneda etaly 1989). In other 
embodiments, the liposome may be complexed or employed in conjunction with nuclear non- 
histone chromosomal proteins (HMG-1) (Kato etaly 1991). In yet further embodiments, the 
liposome may be complexed or employed in conjimction with both HVJ and HMG-1. In other 
25 embodiments, the delivery vehicle may comprise a ligand and a liposome. Where a bacterial 
promoter is employed in the DNA construct, it also will be desirable to include within the liposome 
an appropriate bacterial polymerase. 
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b. Electroporation 

In certain embodiments of the present invention, the expression construct is introduced 
into the cell via electroporation. Electroporation involves the exposure of a suspension of cells 
and DNA to a high-voltage electric discharge. 

5 

Transfection of eukaryotic cells using electroporation has been quite successful. Mouse 
pre-B lymphocytes have been transfected with human kappa-immunoglobulin genes (Potter 
etaL^ 1984), and rat hepatocytes have been transfected with the chloramphenicol 
acetyltransferase gene (Tur-Kaspa et al., 1986) in this manner. 

10 

c. Calcium Phosphate or DEAE-Dextran 

In other embodiments of the present invention, the expression construct is introduced to 
the cells using calcium phosphate precipitation. Human KB cells have been transfected with 
adenovirus 5 DNA (Graham and Van Der Eb, 1973) using this technique. Also in this manner, 
15 mouse L(A9), mouse C127, CHO, CV-1, BHK, NIH3T3 and HeLa cells were transfected with a 
neomycin marker gene (Chen and Okayama, 1987), and rat hepatocytes were transfected with a 
variety of marker genes (Rippe et al, 1990). 

In another embodiment, the expression construct is delivered into the cell using DEAE- 
20 dextran followed by polyethylene glycol. In this riianner, reporter plasmids were introduced into 
mouse myeloma and erythroleukemia cells (Gopal, 1985). 

d. Particle Bombardment 

Another embodiment of the invention for transferring a naked DNA expression construct 
25 into cells may involve particle bombardment. This method depends on the ability to accelerate 
DNA-coated microprojectiles to a high velocity allowing them to pierce cell membranes and 
enter cells without killing them (Klein e/a/., 1987). Several devices for acceleratmg small 
particles have been developed. One such device relies on a high voltage discharge to generate an 
electrical current, which in turn provides the motive force (Yang etaly 1990). The 
30 microprojectiles used have consisted of biologically inert substances such as tungsten or gold 
beads. 
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e. Direct Microinjection or Sonication Loading 

Further embodiments of the present invention include the introduction of the expression 
construct by direct microinjection or sonication loading. Direct microinjection has been used to 
5 introduce nucleic acid constructs into Xenopus oocytes (Harland and Weintraub, 1985), and 

LTK" fibroblasts have been transfected with the thymidine kinase gene by sonication loading 
(Fechheimer et al , 1 987), 

f. Adenoviral Assisted Transfection 

10 In certain embodiments of the present invention, the expression construct is introduced 

into the cell using adenovirus assisted transfection. Increased transfection efficiencies have been 
reported in cell systems using adenovirus coupled systems (Kelleher and Vos, 1994; Gotten 
e/a/., 1992; Curiel, 1994). 

15 g. Receptor Mediated Transfection 

Still further expression constructs that may be employed to deliver nucleic acid construct 
to target cells are receptor-mediated delivery vehicles. These take advantage of the selective 
uptake of macromolecules by receptor-mediated endocytosis that will be occurring in the target 
cells. In view of the cell type-specific distribution of various receptors, this delivery method 
20 adds another degree of specificity to the present invention. Specific delivery in the context of 
another mammaUan cell type is described by Wu and Wu (1993; incorporated herein by 
reference). 

Certain receptor-mediated gene targeting vehicles comprise a cell receptor-specific ligand 
25 and a DNA-binding agent. Others comprise a cell receptor-specific ligand to which the DNA 
construct to be delivered has been operatively attached. Several ligands have been used for 
receptor-mediated gene transfer (Wu and Wu, 1987; Wagner etal^ 1990; Perales etal, 1994; 
Myers, EPO 0273085), which establishes the operability of the technique. In certain aspects of 
the present invention, the ligand will be chosen to correspond to a receptor specifically expressed 
30 on the EOE target cell population. 
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In other embodiments, the DNA delivery vehicle component of a cell-specific gene 
targeting vehicle may comprise a specific binding ligand in combination with a liposome. The 
nucleic acids to be delivered are housed within the liposome and the specific binding ligand is 
fimctionally incorporated into the liposome membrane. The liposome will thus specifically bind 
5 to the receptors of the target cell and deliver the contents to the cell. Such systems have been 
shown to be functional using systems in which, for example, epidermal growth factor (EGF) is 
used in the receptor-mediated delivery of a nucleic acid to cells that exhibit upregulation of the 
EGF receptor. 

10 In still fiirther embodiments, the DNA delivery vehicle component of the targeted 

delivery vehicles may be a liposome itself, which will preferably comprise one or more lipids or 
glycoproteins that direct cell-specific binding. For example, Nicolau etal (1987) employed 
lactosyl-ceramide, a galactose-terminal asialganglioside, incorporated into liposomes and 
observed an increase in the uptake of the insulin gene by hepatoc34es. It is contemplated that the 

15 tissue-specific transforming constructs of the present invention can be specifically delivered into 
the target cells in a similar manner. 

h. Homologous Recombination 
Homologous recombination (KoUer and Smithies, 1992) allows the precise modification 
20 of existing genes, overcomes the problems of positional effects and insertional inactivation, and 
allows the inactivation of specific genes, as well as the replacement of one gene for another. 
Methods for homologous recombination are described in U. S. Patent 5,614,396, incorporated 
herein in its entirety by reference. 

25 Thus a preferred method for the delivery of transgenic constructs involves the use of 

homologous recombination. Homologous recombination relies, like antisense, on the tendency 
of nucleic acids to base pair with complementary sequences. In this instance, the base pairing 
serves to facilitate the interaction of two separate nucleic acid molecules so that strand breakage 
and repair can take place. In other words, the "homologous" aspect of the method relies on 

30 sequence homology to bring two complementary sequences into close proximity, while the 
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"recombination" aspect provides for one complementary sequence to replace the other by virtue 
of the breaking of certain bonds and the formation of others. 



Put into practice, homologous recombination is used as follows. First, a site for 
5 integration is selected within the host cell. Sequences homologous to the integration site are then 
included in a genetic construct, flanking the selected gene to be integrated into the genome. 
Flanking, in this context, simply means that target homologous sequences are located both 
upstream (5') and downstream (3') of the selected gene. These sequences should correspond to 
some sequences upstream and downstream of the target gene. The construct is then introduced 
10 into the cell, thus permitting recombination between the cellular sequences and the construct. 

As a practical matter, the genetic construct will normally act as far more than a vehicle to 
insert the gene into the genome. For example, it is important to be able to select for 
recombinants and, therefore, it is common to include within the construct a selectable marker 

IS gene. This gene permits selection of cells that have integrated the construct into their genomic 
DNA by conferring resistance to various biostatic and biocidal drugs. In addition, this technique 
may be used to "knock-out" (delete) or interrupt a particular gene. Thus, another approach for 
altering or mutating a gene involves the use of homologous recombination, or "knock-out 
technology". This is accomplished by including a mutated or vastly deleted form of the 

20 heterologous gene between the flanking regions within the construct. The arrangement of a 
construct to effect homologous recombination might be as follows: 

...vectonS'-flanking sequence«selected gene* selectable marker gene*flanking sequence- 
3'*vector... 

25 

Thus, using this kind of construct, it is possible, in a single recombinatorial event, to (i) 
"knock out" an endogenous gene, (ii) provide a selectable marker for identifying such an event 
and (iii) introduce a transgene for expression. 

30 Another refinement of the homologous recombination approach involves the use of a 

"negative" selectable marker. One example is the use of the cytosine deaminase gene in a 
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negative selection method as described in U.S. Patent No. 5,624,830. The negative selection 
marker, unlike the selectable marker, causes death of cells which express the maiker. Thus, it is 
used to identify undesirable recombination events. When seeking to select homologous 
recombinants using a selectable marker, it is difficult in the initial screening step to identify 

5 proper homologous recombinants from recombinants generated from random, non-sequence 
specific events. These recombinants also may contain the selectable marker gene and may 
express the heterologous protein of interest, but will, in all likelihood, not have the desired 
phenotype. By attaching a negative selectable marker to the construct, but outside of the 
flanking regions, one can select against many random recombination events that will incorporate 

10 the negative selectable marker. Homologous recombination should not introduce the negative 
selectable marker, as it is outside of the flanking sequences. 

3. Marker geoes 

In certain aspects of the present invention, specific cells are tagged with specific genetic 
15 markers to provide information about the fate of the tagged cells. Therefore, the present 
invention also provides recombinant candidate screening and selection methods which are based 
upon whole cell assays and which, preferably, employ a reporter gene that confers on its 
recombinant hosts a readily detectable phenotype that emerges only under conditions where a 
general DNA promoter positioned upstream of the reporter gene is fimctional. Generally, 
20 reporter genes encode a polypeptide (marker protein) not otherwise produced by the host cell 
which is detectable by analysis of the cell culture, e.g., by fluorometric, radioisotopic or 
spectrophotometric analysis of the cell culture. 

In other aspects of the present mvention, a genetic marker is provided which is detectable 
25 by standard genetic analysis techniques, such as DNA amplification by PCR^ or hybridization 
using fluorometric, radioisotopic or spectrophotometric probes. 

a. Screening 

Exemplary enzymes include esterases, phosphatases, proteases (tissue plasminogen 
30 activator or urokinase) and other enzymes capable of being detected by their activity, as will be 
known to those skilled in the art. Contemplated for use in the presOTt invention is green 
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fluorescent protein (GFP) as a marker for transgene expression (Chalfie et aLy 1994). The use of 
GFP does not need exogenously added substrates, only irradiation by near UV or blue light, and 
thus has significant potential for use in monitoring gene expression in living cells. 

5 Other particular examples are the enzyme chloramphenicol acetyltransferase (CAT) 

which may be employed with a radiolabelled substrate, firefly and bacterial luciferase, and the 
bacterial enzymes p-galactosidase and P-gluciu-onidase. Other marker genes within this class are 
well known to those of skill in the art, and are suitable for use in the present invention. 

10 b. Selection 

Another class of reporter genes which confer detectable characteristics on a host cell are 
those which encode polypeptides, generally enzymes, which render their transformants resistant 
against toxins. Examples of this class of reporter genes are the neo gene (Colberre-Garapin 
etai, 1981) which protects host cells against toxic levels of the antibiotic G418, the gene 
15 conferring streptomycin resistance (U. S. Patent 4,430,434), the gene conferring hygromycin B 
resistance (Santerre etal., 1984; U. S. Patents 4,727,028, 4,960,704 and 4,559,302), a gene 
encoding dihydrofolate reductase, which confers resistance to methotrexate (Alt et al, 1978), the 
enzyme HPRT, along with many others well known in the art (Kaufinan, 1990). 

20 D. Culture System 

For long-term, high-yield production of a recombinant protein, polypeptide or peptide, 
stable expression is preferred. For example, cell lines that stably express constructs encodmg a 
protein, polypeptide or peptide may be engineered. Rather than using e3q>ression vectors that 
contain viral origins of replication, host cells can be transformed with vectors controlled by 

25 appropriate expression control elements (e.g., promoter, enhancer, sequences, transcription 
terminators, polyadenylation sites, etc.), and a selectable marker. Following the introduction of 
foreign DNA, engineered cells may be allowed to grow for 1-2 days in an enriched media, and 
then are switched to a selective media. The selectable marker in the recombinant plasmid 
confers resistance to the selection and allows cells to stably integrate the plasmid into their 

30 chromosomes and grow to form foci which in tum can be cloned and expanded into cell lines. 
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A number of selection systems may be used, including, but not limited to, the herpes 
simplex viras thymidine kinase (tk), hypoxanthine-guanine phosphoribosyltiansferase (hgprt) 
and adenine phosphoribosyltransferase (aprt) genes, in tk", hgprt" or aprt" cells, respectively. 
Also, antimetabolite resistance can be used as the basis of selection for dihydrofolate reductase 
5 (dhfr), that confers resistance to methotrexate; gpt, that confers resistance to mycophenolic acid; 
neomycin (neo), that confers resistance to the aminoglycoside G-418; and hygromycin (hygro), 
that confers resistance to hygromycin. 

Animal cells can be propagated in vitro in two modes: as non-anchorage dependent cells 
10 growing in suspension throughout the bulk of the culture or as anchorage-dependent cells 
requiring attachment to a solid substrate for their propagation (i.e., a monolayer type of cell 
growth). 

Non-anchorage dependent or suspension cultures &om continuous established cell lines 
15 are the most widely used means of large scale production of cells and cell products. However, 
suspension cultured cells have limitations, such as tumorigenic potential and lower protein 
production than adherent cells. 

Large scale suspension culture of mammalian cells in stirred tanks is a common method 
20 for production of recombinant proteins. Two suspension culture reactor designs are in wide use - 
the stirred reactor and the airlift reactor. The stirred design has successfully been used on an 
8000 liter capacity for the production of interferon. Cells are grown in a stainless steel tank with 
a height-to-diameter ratio of 1 : 1 to 3: 1 . The culture is usually mixed with one or more agitators, 
based on bladed disks or marine propeller pattems. Agitator systems offering less shear forces 
25 than blades have been described. Agitation may be driven either directly or indirectly by 
magnetically coupled drives. Indirect drives reduce the risk of microbial contamination through 
seals on stirrer shafts. 

The airlift reactor, also initially described for microbial fermentation and later adapted for 
30 mammalian culture, relies on a gas stream to both mix and oxygenate the culture. The gas 
stream enters a riser section of the reactor and drives circulation. Gas disengages at the culture 
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surface, causing denser liquid free of gas bubbles to travel downward in the downcomer section 
of the reactor. The main advantage of this design is the simplicity and lack of need for 
mechanical mixmg. Typically, the heijght-to-diameter ratio is 10:1. The airlift reactor scales up 
relatively easily, has good mass transfer of gases and generates relatively low shear forces. 

5 

It is contemplated that the proteins, polypeptides or peptides of the invention may be 
"overexpressed", i.e., expressed in increased levels relative to its natural expression in cells. 
Such overexpression may be assessed by a variety of methods, including radio-labeling and/or 
protein puriiBcation. However, simple and direct methods are preferred, for example, those 
10 involving SDS/PAGE and protein staming or western blotting, followed by quantitative analyses, 
such as densitometric scanning of the resultant gel or blot. A specific increase in the level of the 
recombinant protein or peptide in comparison to the level in natural cells is indicative of 
overexpression, as is a relative abundance of the specific protein in relation to the other proteins 
produced by the host cell and, e.g., visible on a gel. 

15 

£. Complementation 

The terms "stractural complementation", "complementation" or "alpha complementation" 
as used herein certain embodiments refers to the ability of at least one polypeptide comprising a 
protein fragment or domain to alter the activity of at least a second polypeptide comprising a 
20 protein fragment or domain. In certain embodiments, the at least one polypeptide and the at least 
second polypeptide are derived from the same precursor protein sequence. A non-Hmiting 
example of this is the complementation of P-lactosidase*s activity that occurs when the a- 
fiagment and the © firagment of P-lactosidase interact to produce an active p^lactosidase 
enzymatic complex. 

25 

Other complementing protein Augments are known in the art. Non-limiting examples 
include the P, falciparum thymidylate synthase and dihydrofolate reductase domains (Shallom et 
aly 1999), and the alpha and beta subunits of the mitochondrial processing peptidase of different 
species (Adamec et al.y 1999), whose activity was detected by the used of temperature sensitive 
30 mutant yeast strains. 
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Thus, it is contemplated that various peptide or polypeptide sequences may be used to 
produce fusion proteins with a target protein, so that the folding of the target protein into a 
. soluble form can be detected via the change in activity of the complemented peptide or 
polypeptide. It is also contemplated that additional complementing fragmrats of commonly used 
5 or well known selectable or screenable markers may be made for use in the present invention. 
Non-limiting examples of such markers include a target binding protein, such as ubiquitin; an 
enzyme, such as P-galactosidase, cytochrome c, chymotrypsin inhibitor, Rnase, 
phosphoglycerate kinase, invertase, staphylococcal nuclease, thioredoxin C, lactose permease, 
amino acyl tRNA synthase, or dihydrofolate reductase; a protein inhibitor, a fluorophore or a 
10 chromophore, such as green fluorescent protein, blue fluorescent protein, yellow fluorescent 
protein, luciferase or aquorin. 

It is contemplated that one or more fragments of such markers may be produced through 
recombinant technology that is well known to those of skill in the art, to produce an 

IS complementation system for assaying protein folding as described herein. In a non-limiting 
example, a nucleic acid encoding a N-terminal sequence of about 250 amino acids or less of a 
marker protein may be operatively associated with a nucleic acid of a protein of interest to be 
folded into soluble form. Such nucleic acids may be used to construct an expression vector as 
described herein, and used to complement a cell that expresses the C-terminal terminal sequence 

20 of the marker protein. In an altemative non-limiting example, a nucleic acid encoding a C- 
terminal sequence of about 250 amino acids or less of a marker protein may be operatively 
associated with a nucleic acid of a protein of interest to be folded into soluble form. Such 
nucleic acids may be used to constract an expression vector as described herein, and used to 
complement a cell that expresses the N-terminal terminal sequence of the marker protein. Of 

25 course, one of skill in the art may design nucleic acids encoding marker gene fragments of 
various lenghts. In certain embodiments, the marker gene fragment may encode a polypeptide or 
peptide of less than about 200, about 150, about 100, about 99, about 98, about 97, about 96, 
about 95, about 94, about 93, about 92, about 91, about 90, about 89, about 88, about 87, about 
86, about 85, about 84, about 83, about 82, about 81, about 80, about 79, about 78, about 77, 

30 about 76, about 75, about 74, about 73, about 72, about 71, about 70, about 69, about 68, about 
67, about 66, about 65, about 64, about 63, about 62, about 61, about 60, about 59, about 58, 
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about 57, about 56, about 55, about 54, about 53, about 52, about 51, about 50, about 49, about 
48, about 47, about 46, about 45, about 44, about 43, about 42, about 41, abbut 40, about 39, 
about 38, about 37, about 36, about 35, about 34, about 33, about 32, about 31, about 30, about 
29, about 28, about 27, about 26, about 25, about 24, about 23, about 22, about 21, about 20, 
5 about 19, about 18, about 17, about 16, about 15, about 14, about 13, about 12, about 11, about 
10, about 9, about 8, about 7, about 6, about 5, to about 4 amino acids, which is operatively 
associated with the nucleic acid encoding the protein that is soluble when folded correctly. 

F. Screening Assays 

10 The present invention is directed to the use of an a-complementation system to screen for 

various aspects of protein fold and/or solubility. As discussed above, an important aspect of the 
invention is the use of a fusion protein that contains sequences from the protein of interest as 
well as a portion of a marker protein. The marker protein, in the context of the fusion, is 
incapable of exhibiting its detectable phenotype. However, when expressed in an environment 

15 that also includes the complementing portion of the marker protein, **complementation" takes 
place and a detectable event occurs, assuming that the protein is properly folded and remains 
soluble. This assay provides many advantages, including fidelity, sensitivity, ease of handling, 
and ready adaptabiUty. 

20 1. Methods 

There are three primary applications for the invention: screening of proteins for 
suitability in recombinant polypeptide production, screening for mutants or domain boimdaries 
with altered folding and/or solubility profiles (e.^., diagnosis of disease), and screening for drugs 
that modulate protein folding and/or solubihty. In the first embodiment, the method includes the 
25 steps of: 

a) providing an expression construct comprising (i) a gene encoding a fusion protein, 
said fiision protein comprising a protein of interest fiised to a first segment of a 
marker protein, wherein said first segment does not affect the folding or solubility 
30 of the protein of interest, or affects it only is a systematic (i.e., predictable, and 
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repeatable) manner and (ii) a promoter active in said host cell and operably linked 
to said gene; 

b) expressing said fusion protein in a host cell that also expresses a second segment 
of said marker protein, wherein said second segment is capable of structural 

5 complementation with said first segment; and 

c) determining structural complementation. 



By comparing the degree of structural complementation in the method with that seen with 
appropriate negative controls, changes in folding and/or solubility of said protein can be 
10 determined. By looking at particular cell types from patients suspected of having particular 
disease states, this general method of screening can be transformed into a specific diagnostic 
method. 

In another embodiment, a method of screening for folding and/or solubility mutants is 
1 5 provided, and includes the steps of: 

a) providing a gene encoding fiision protein comprising (i) a protein of interest and 
(ii) a first segment of a marker protein, wherein said first segment does not affect 
the folding or solubility of the protein of interest, or affects it only is a systematic 

20 (/.e, predictable and repeatable) manner, wherein said fusion protein is not 

properly folded and/or soluble when expressed in said host cell; 

b) mutagenizing that portion of the gene encoding said protein of interest; 

c) expressing said fusion protein in a host cell that expresses a second segment of 
said marker protein, wherein said second segment is capable of structural 

25 complementation with said first segment; and 

d) determining structural complementation. 



Again, a relative change in structural complementation, as compared to the structural 
complementation observed with the unmutagenized fusion protein, indicates a change in proper 
30 folding and/or solubility of said protein. An altemative embodiment involves the mutation of a 
gene of interest prior to its fusion with the marker protein segment. 
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Finally, a third assay involves screening for candidate modulator substances that 
modulate protein folding and/or solubility, including the steps of: 



10 



5 



a) providing an expression construct comprising (i) a gene encoding fusion protein, 
said fusion protein comprising a protein of interest fused to a first segment of a 
marker protein, wherein said first segment does not affect the folding or solubility 
of the protein of interest, or affects it only is a systematic predictable and 
repeatable) manner, and (ii) a promoter active in said host cell and operably 
linked to said gene; 



b) expressing said fusion protein in a host cell that expresses a second segment of 
said marker protein, wherein said second segment is capable of structural 
complementation with said first segment; 



c) contacting the host cell with said candidate modulator substance; and 



15 



d) determining structural complementation. 



Again, a relative change in structural complementation, as compared to the structural 
complementation observed in the absence of said candidate modulator substance, indicates that 
said candidate modulator substance is a modulator of protein folding and/or solubiUty 



2. Modulators 

As used herein the term "candidate substance" refers to any molecule that may potentially 
inhibit or enhance protein folding and/or solubility. The candidate substance may be a protein or 
fi:agment thereof, a small molecule, or even a nucleic acid molecule. Using lead compounds to 
25 help develop improved compounds is know as ""rational drug design" and includes not only 
comparisons with know inhibitors and activators, but predictions relating to the structure of 
target molecules. 

The goal of rational drug design is to produce structural analogs of biologically active 
30 polypeptides or target compounds. By creating such analogs, it is possible to fashion drugs, 
which are more active or stable than the natural molecules, which have different susceptibility to 



20 
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alteration or which may affect the function of various other molecules. In one approach, one 
would generate a three-dimensional structure for a target molecule, or a fragment thereof This 
could be accomplished by x-ray crystallography, computer modeling or by a combination of both 
approaches. 

5 

It also is possible to use antibodies to ascertain the structure of a target compound 
activator or inhibitor. In principle, this approach yields a pharmacore upon which subsequent 
drug design can be based. It is possible to bypass protein crystallography altogether by 
generating anti-idiotypic antibodies to a functional, pharmacologically active antibody. As a 
10 mirror image of a mirror image, the binding site of anti-idiotype would be expected to be an 
analog of the original antigen. The anti-idiotype could then be used to identify and isolate 
peptides from banks of chemically- or biologically-produced peptides. Selected peptides would 
then serve as the pharmacore. Anti-idiotypes may be generated using the methods described 
herein for producing antibodies, using an antibody as the antigen. 

15 

On the other hand, one may simply acquire, from various conunercial sources, small 
molecule libraries that are believed to meet the basic criteria for useful drugs in an effort to 
"brute force" the identification of useful compounds. Screening of such Ubraries, including 
combinatorially generated libraries {e.g., peptide libraries), is a rapid and efficient way to screen 
20 large number of related (and unrelated) compounds for activity. Combinatorial approaches also 
lend themselves to rapid evolution of potential drugs by the creation of second, third and fourth 
generation compoimds modeled of active, but otherwise imdesirable compounds. 

Candidate compounds may include fragments or parts of naturally-occurring compounds, 
25 or may be foxmd as active combinations of known compounds, which are otherwise inactive. It 
is proposed that compounds isolated from natural sources, such as animals, bacteria, fiuigi, plant 
sources, including leaves and bark, and marine samples may be assayed as candidates for the 
presence of potentially useful pharmaceutical agents. It will be understood that the 
pharmaceutical agents to be screened could also be derived or synthesized from chemical 
30 compositions or man-made compoimds. Thus, it is understood that the candidate substance 
identified by the present invention may be peptide, polypeptide, polynucleotide, small molecule 
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inhibitors or any other compounds that may be designed through rational drug design starting 
from known inhibitors or stimulators. 

Other suitable modulators include antisense molecules, ribozymes, and antibodies 
5 (including single chain antibodies), each of which would be specific for the target molecule. 
Such compounds are described in greater detail elsewhere in this document. For example, an 
antisense molecule that boxmd to a translational or transcriptional start site, or splice junctions, 
would be ideal candidate inhibitors. 

10 In addition to the modulating compounds initially identified, the inventors also 

contemplate that other sterically similar compounds may be formulated to mimic the key 
portions of the structure of the modulators. Such compounds, which may include 
peptidomimetics of peptide modulators, may be used in the same manner as the initial 
modulators. 

15 

3. Assay Formats 

A quick, inexpensive and easy assay to run is an in vitro assay. Various cell lines can be 
utilized for such screening assays, including cells specifically engineered for this purpose, as 
discussed in detail above. Depending on the assay, culture may be required. The cell is 
20 examined using a-complementation as a readout. Altematively, molecular analysis may be 
performed, for example, looking at protein expression, mRNA expression (including differential 
display of whole cell or polyA RNA) and others. 

In vivo assays involve the use of various animal models, including transgenic animals that 
25 have been engineered to express both the fusion protein (target protein + first marker segment) 
and the complementing molecule (second marker segment). Due to their size, ease of handling, 
and information on their physiology and genetic make-up, mice are a preferred embodiment, 
especially for transgenics. However, other animals are suitable as well, including insects, 
nematodes, rats, rabbits, hamsters, guinea pigs, gerbils, woodchucks, cats, dogs, sheep, goats, 
30 pigs, cows, horses and monkeys (including chimps, gibbons and baboons). Assays for 
modulators may be conducted using an animal model derived from any of these species. 
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In such assays, one or more candidate substances are administered to an animal, and the 
ability of the candidate substance(s) to alter protein folding and/or solubility, as compared to a 
similar animal not treated with the candidate substance(s), identifies a modulator. 

5 

Treatment of these animals with candidate substances will involve the administration of 
the compound, in an appropriate form, to the animal. Administration will be by any route that 
could be utilized for clinical or non-clinical purposes, including but not limited to oral, nasal, 
buccal, or even topical Alternatively, administration may be by intratracheal instillation, 
10 bronchial instillation, intradermal, subcutaneous, intramuscular, intraperitoneal or intravenous 
injection. Specifically contemplated routes are systemic intravenous injection, regional 
administration via blood or lymph supply, or directly to an affected site. 

Determining the effectiveness of a compound in vivo may involve a variety of different 
IS criteria. Also, measuring toxicity and dose response can be performed in animals in a more 
meaningfiil fashion than in in vitro or in cyto assays. 

4. High Throughput and Flow Cytometry 

High throughput formats are of particular use in drug screening. Flow cytometry 
20 involves the separation of cells or other particles in a Uquid sample based upon signals generated 
in the host cells. Generally, the purpose of flow cytometry is to analyze the separated particles 
for one or more characteristics thereof The basis steps of flow cjrtometry involve the direction 
of a fluid sample through an apparatus such that a liquid stream passes through a sensing region. 
The particles should pass one at a time by the sensor and are categorized base on size, refiraction, 
25 light scattering, opacity, roughness, shape, fluorescence, etc. 

Rapid quantitative analysis of cells proves usefiil in biomedical research and medicine. 
Apparati permit quantitative multiparameter analysis of cellular properties at rates of several 
thousand cells per second. These instruments provide the ability to diflerentiate among cell 
30 types. Data are often displayed in one-dimensional (histogram) or two-dimensional (contour 
plot, scatter plot) fi-equency distributions of measured variables. The partitioning of 
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multiparameter data files involves consecutive use of the interactive one- or two-dimensional 
graphics programs. 



Quantitative analysis of multiparameter flow cytometric data for rapid cell detection 
5 consists of two stages: cell class characterization and sample processing. In genwal, the process 
of cell class characterization partitions the cell feature into cells of interest and not of interest. 
Then, in sample processing, each cell is classified in one of the two categories according to the 
region in which it falls. Analysis of the class of cells is very important, as high detection 
perforaiance may be expected only if an ^propriate characteristic of the cells is obtained. 

10 

Not only is cell analysis performed by flow cytometry, but so too is sorting of cells. In 
U.S. Patent 3,826,364 (incorporated by reference), an apparatus is disclosed which physically 
separates particles, such as fimctionally different cell types. In this machine, a laser provides 
illumination which is focused on the stream of particles by a suitable lens or lens system so that 
15 there is highly localized scatter fi-om the particles therein. In addition, high intensity source 
illumination is directed onto the stream of particles for the excitation of fluorescent particles in 
the stream. Certain particles in the stream may be selectively charged and then separated by 
deflecting them into designated receptacles. A classic form of this separation is via fluorescent 
tagged antibodies, which are used to mark one or more cell types for separation. 

20 

Other methods for flow cytometry can be found in U.S. Patents 4,284,412; 4,989,977; 
4,498,766; 5,478,722; 4,857,451; 4,774,189; 4,767,206; 4,714,682; 5,160,974; and 4,661,913, all 
of which are incorporated by reference. 

25 G. Examples 

The following examples are included to demonstrate preferred embodiments of the 
invention. It should be appreciated by those of skill in the art that the techniques disclosed in the 
examples which follow represent techniques discovered by the inventor to fimction well in the 
practice of the invention, and thus can be considered to constitute preferred modes for its 

30 practice. However, those of skill in the art should, in light of the present disclosure, appreciate 
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that many changes can be made in the specific embodiments which are disclosed and still obtain 
a like or similar result without departing firom the spirit and scope of the invention. 

EXAMPLE 1: MATERIALS AND METHODS 

5 Antibodies, Chemicals and Expression Vectors 

Monoclonal mouse anti-HA and polyclonal sheep anti-MBP antibodies were purchased 
from BabCO (Richmond, CA). Horseradish peroxidase-conjugated (HRP) secondary antibodies 
were from Jackson ImmunoResearch Laboratories (West Grove, PA). Isopropyl-P-D- 
thiogalactopyranoside (IPTG) and 5-bromo-4-chloro-3-indolyl-p-D-galactopyranoside (X-gal) 

10 were from Boehringer Mannheim (Indianapolis, IN). 0-nitrophenyl-p-D-galactopyranoside 
(ONPG) was purchased from Sigma (St. Louis, MO). The expression vector pMAL-c2x, coding 
for an MBP-a fusion, was from New England Biolabs (Beverly, MA). A plasmid containing 
cDNA for the LivF protein of M jannaschii (MJ1267) was obtained from the American Type 
Culture Collection, Plasmid pAPP770 containing cDNA for the Alzheimer's precursor protein 

15 (APP) was the generous gift of Dr. J. Herz, Dept. Molecular Genetics, UT Southwestem, Dallas, 
TX. Plasmid pTRx.parallell containing cDNA for thioredoxin was the generous gift of Dr. K. 
Gardner, Dept. Biochemistry, UT Southwestem, Dallas, TX. Plasmid pGex-2t containing cDNA 
for glutathione S-transferase was from Amersham/Pharmacia (Piscataway, NJ). 

Construction of a-Fusion Expression Vectors 

20 Complementary DNA fragments coding for residues 404-644 (NBDl-B) and 419-655 

(NBDl-D) of CFTR were excised using Ndel and Xhol from pET28a expression plasmids 
generated as previously described {Qn & Thomas, 1996). Based upon homology to the recently 
published HisP NED crystal structure (Hung et al, 1998), these constructs are predicted to 
contain the entire first NED of CFTR, The resulting fragments were ligated into 

25 Ndel/Sall-digested pMal-c2x in place of the maltose-binding protein (MBP), forming an 
in-frame fiision with the a-fragment (residues 7-58 from fiiU length p-galactosidase). 
Expression cassette PCR was used to assmble the other a-fiision constmcts examined. The 
MJ1267 cDNA was also subcloned into the Ndel and Sail sites of pMal-c2x. The resulting 
vector contained an in-frame stop codon between MJ1267 and the polylinker of pMAL-c2x 
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which was removed by site-directed mutagenesis completing the a-fusion construct. TRx, GST 
and Ap (APP residues M2) were each Ugated into Ndel/Sacl-digested pMal-c2x. The cloning 
strategy used to assemble the tandem Ap/a-fiision construct, Ap-rpt, was similar to that 
described elsewhere (Culvenor et al, 1998), and utilized an intemal EcoRI site to generate an 
5 exact Ap(l-42) repeat with no intervening sequence. All targets were subcloned in the pMal-c2x 
vector and, therefore, utilize the same promoter. In addition, the ABC transporter NBDs 
evaluated were also expressed in BL21 cells under the control of the T7 promoter of pET28a. In 
each case, fidelity of PGR™ products and constructs was verified by restriction mapping and 
DNA sequencing. 

10 To serve as a marker for some of the expressed proteins (MJ1267, GFTR-NBDl, TRx, 

GST and AP), an HA-tag sequence was introduced into the Sail site of the pMal-c2x expression 
vector using two annealed complimentary oligonucleotides coding for the tag sequence and 
flanked by Sail linker sequences. Correct orientation of the resulting ligation products was 
confirmed by DNA sequencing. 

15 Site-directed mutagenesis 

Oligonucleotide-directed mutagenesis using the QuickChange mutagenesis kit 
(Stratagene, La JoUa, CA) was performed to generate the mutant MBP proteins in the expression 
vector pMal-c2x. The sequences of the antisense mutagenic primers used are as follows: 
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G32D/I33P . 



5'-GATGCTCAACGGTGACTTTAGGATCGGTATCTTCTCGAATTTC-3' 
G32D - 

5'-CAACGGTGACTTTAATATCGGTATCTTTCTCG-3' 

5 133P. 

5'-GGTGACTTTAGGTCCGGTATCTTTCTCG-3' 

Mutation incorporation was verified by DNA sequencing. Plasmid DNA was purified using 
reagents supplied by Qiagen Inc. 

Expression of fusion proteins 

10 Expression constructs were transformed into DH5a E. coli by standard methods and 

colonies selected on LB-agar plates supplemented with 100 |ig/mL ampicillin (amp). From 
single colonies, 10 mL LB + amp cultures were inoculated and allowed to grow overnight at 
37°C. The following day, the overnight culture was diluted 1000-fold into a firesh 10 mL LB 
+amp culture and allowed to grow to mid log phase (ODeoo 0.5). Protein production was 

1 5 induced by the addition of IPTG to 0.3 mM and the cells were fiirther incubated for the indicated 
times. 

In vitro assay of ^gal complementation 

After the completion of fiision protein expression, cells (1.5 mL) were harvested by 
centrifugation at 10,000 x g for two minutes. After removal of the supematants, the cell pellets 

20 were resuspended in 1 mL of buffer Z (10 mM KCl, 2.0 mM MgS04, 100 mM NaHP04, pH 7.0). 
The cells were pelleted again, resuspended in 0.3 mL buffer Z and lysed by three fireeze/thaw 
cycles between liquid nitrogen and a 37°C water bath. Next, 0.1 mL of the resulting cell lysate 
was transferred to a clean microfiige tube to which buffer Z (0.7 ml) supplemented with 0.27% 
p-mercaptoethanol was added. Reactions were initiated by the addition of 160 jiL of ONPG 

25 solution (4.0 mg/mL dissolved in buffer Z) and incubated at 37°C for 10 min. Reactions were 
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quenched by the addition of 0.4 mL 1 M NaiCOa. Tubes were then centrifiiged at 10,000 x g for 
10 min to remove debris and the supematant's absorption at 420 nm was measured. 

Analysis of soluble and insoluble fractions 

To biochemically analyze the solubility characteristics of the expressed fusion proteins, 3 
5 mL of culture from cells induced for the indicated times was harvested by centrifugation, washed 
once and resuspended in 600 |iL lysis solution (100 mM NaCl, 1 mM EDTA, 50 mM Tris CI, pH 
7.6). The cell suspensions were lysed by sonication three times for 30 sec at SO^'C duty cycle 
and power output of 4 using a Branson model 450 sonifier fit with a microtip probe. All 
manipulations were carried out on ice. After sonication, the solution was centrifuged to separate 
10 soluble and insoluble fractions at 10,000 x g in a microfiige at 4°C for 10 min. Supematant and 
pellet fractions were analyzed by SDS PAGE and Westem blotting where appropriate. 

SDS PAGE and Westem blotting 

Expressed proteins were analyzed by electrophoresis through 10% Tricine-SDS 
polyacrylamide gels using the buffer system of Schagger and von Jagow (1987). Protein bands 

15 were visualized by staining with coomassie blue. For Westem immunoblotting, standard 
methods were employed for transfer of proteins from gels to nitrocellulose. Resulting 
membranes were blocked in TBS containing Tween-20 and 10% dehydrated milk for at least 1 hr 
and incubated at room temperature with the indicated primary antibodies. Immunoreactive 
bands were visualized by ECL (Amersham, Piscataway, NJ) using appropriate HRP-conjugated 

20 secondary antibodies and X-ray film. The density of bands on coomassie stained gels and 
exposed x-ray film were measured on an Agfa Arcus scanner and quantified using Molecular 
Analyst software (BioRad, Hercules, CA). 

Blue/white screening for ^gal complementation 

Single colonies of DH5a containing the individual expression constructs were analyzed 
25 for the ability of the a-fusion proteins to complement p-gal activity in vivo. Bacteria harboring 
each construct were streaked to single colonies on LB-agar plates supplemented with 100 |ig/mL 
ampicillin, 80 ng/mL Xgal, and 0.1 mM IPTG. The plates were incubated at 37°C for 18 to 48 
hr and activity of p-gal was assessed by visualization of blue color in a-complementing colonies. 
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Colorimetric screening for fi-gal complementation in 96'Well plates 

Cells harboring each of the indicated expression constructs were grown to mid log phase 
(OD600 « 0.5) from overnight cultures as described above. 125 |xl of each culture ws transferred 
to individual wells of a flat-bottom 96-weIl plate containing 125 ^il LB media supplemented with 
5 100 ^g/mL ampicillin and 0.6 mM IPTG (resulting in a final [IPTG] of 0.3 mM). The plates 
were then placed on an orbit shaker at 37®C with rapid shaking. After induction for 1 hr, X-gal 
was added to a final concentration of 80 jxg/mL, and the plate was returned to the shaker at 3VC 
overnight. 

EXAMPLE 2: RESULTS 

In order to test the ability of a-fragment chimeras to complement the co-fragment of p-gal 
and report target protein solubility by producting active P-gal, model polypq)tides were fiised to 
the N-terminus of the a-fragment in an inducible bacterial expression plasmid (FIG. IB). Initial 
experiments focused on the maltose binding protein (MBP) of colL MBP is normally secreted 
into the periplasm oiE, coil however, the construct used in the present study lacks the required 
leader sequence and therefore, folds in the cytoplasm where the oo-fragment is located. 

To assess the relative abilities of the expressed a-fiision proteins to complement P-gal 
activity in v/vo, E. coli harboring the fiision expression constructs were plated on IPTG/X-gal 
indicator plates and the development of blue color in resulting colonies was monitored. pUC19- 
transformed DH5a E. coli, which express a 54 residue a-fragment (residues 6-59 of P-gal), are 
20 the most intensely blue. This represents the level of p-gal complementation attributable to the 
a-fragment alone. The MBP-a fiision protein (MBP residues 1-366, a: residues 7-58 of P-gal) 
also yields significant a-complementation, although less than observed for pUC19. Yanisch- 
Perrone/a/. (1985). 

25 Previously, several mutations were identified which lead to diminished solubihty and 

reduced periplasmic yield of MBP (Betton Hofiiung, 1986). For example, mutation of two 
residues, 133P and G32D, decreased soluble periplasmic MBP by more than 100-fold, This 
double mutation was introduced into MBP/a fiision construct, and monitored for a- 

1657123.1 

-67- 



complemenation on indicator plates. The wild-type MPB and the double mutant expressed at 
equivalent levels. Consistent with the previously reported effect of these mutations on the i/i 
vivo solubility of MBP,' the G32D/I33P double mutation significantly impaired the solubility and, 
thus, ability of the fusion protein to complement P-gal activity on indicator plates. 

5 

To test the generality of the assay system, a series of a-fusion constructs were generated. 
Fusion to a of either TRx or GST (two highly soluble proteins used regularly as fusions to aid in 
the solubility of ill-behaved partners) and express in DH5a on indicator plates results in blue 
color development that is as intense as that observed for the MBP/a fusion construct. Next, a 

10 series of nucleotide binding domains (NBD) from two ATP binding-cassette (ABC) transporters 
were generated and examined. Two are polypeptides predicted to include the first NBD of the 
cystic fibrosis transmembrane conductance regulator (CFTR): NBDl-B (CFTR residues 404- 
644), and NBDl-D (CFTR residues 419-655). This domain has poor solubility properties due 
either to inherently limited solubility in the absence of other domains of the protein with which it 

15 normally interacts, or to marginal stability/misfolding or both. Several mutations within this 
domain prevent proper folding of the full length CFTR in vivo and, thus, lead to cystic fibrosis. 
The third NBD, LivF (MJ1267), is a subunit of the branched chain amino acid transporter from 
the hyperthermophilic archaeon M. jannaschl CFTR NBDl has been shown to be insoluble, 
forming inclusion bodies when expressed in E. colt (Qu & Thomas, 1996), unless fused to 

20 soluble protein such as wild-type MBP (Ko et al, 1993) or GST (King & Sorscher, 1998). 
MJ1267, however, has proven much more soluble, yielding 10% soluble protein from a T7 
expression system in BL21 E, coli. 

When expressed in DH5a on indicator plates, both CFTR NBD/a fusions result in very 
Uttle blue color, even after 48 hr of growth, although the NBDl-D/a fusion appears to 
25 complement measurably more than NBDl-B. By contrast, expression of the MJ1267/a fusion 
results in a significantly elevated level of blue color when compared to either of the CFTR 
NBD/a fusion proteins. The MBP/a fusion proteins express at higher levels than the NBD/a 
fusions as a group, and thus more activity. It should be noted that relative levels of 
a-complementation, as evidenced by blue color on indicator plates, can be observed at the single 
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colony level for each of the constructs tested, providing a measure that is independent of plated 
cell density. 

To test whether the a-complementation assay is adaptable to a format amenable to rapid- 
5 throughput screening, the constructs described above were analyzed for the development of blue 
color in a 96-well plate P-gal assay. The levels of blue color obtained in the micro titer plate 
assay for each construct agrees well with that obtained in the agar plate assay. In fact, the 
difference in color levels observed upon comparison of the two CFTR-NBD/a-fusions is more 
apparent in the 96-well plate assay. 

10 

To verify the hypothesis that the intensity of blue color on indicator plates is reporting 
target protein solubility, the amount of soluble versus insoluble protein was measured in 
biochemical fractionation experiments. E, coli expressing wild-type, G32D, 133P, and 
G32D/I33P-MBP/a fusions were subjected to cell disruption and fractionation by centrifugation. 

15 Analysis by SDS PAGE of the soluble and insoluble fractions for each fusion protein revealed a 
correlation between solubility and level of blue color on Xgal plates. It is important to note that 
the aga plate P-gall assay, after long incubation times, is most sensitive to changes from 
insoluble to higher levels of solubility, the range of greatest practical utility. The wild-type 
MBP/a fusion fractionates primarily to the supernatant, while the double mutant (G32D/I33P) 

20 fractionates primarily to the pellet. Fractionation results were further confirmed by Western 
blots probed with anti-MBP antibodies. The fraction of MBP/a fusions that are soluble is in 
agreement with the previously published stability and folding yield of these mutants without the 
a-fragment marker (Betton & Hofiiung, 1996), This suggests that the a-fragment does not 
significantly impact the overall solubility characteristics of the MBP fusion proteins and is 

25 therefore a good reporter of target protein solubility. Similarly, the high levels of blue color 
observed for the GST/a and TRx/a fusions correlates well with the biochemical firactionation 
experiments, which indicate a majority of both of these proteins partions to the soluble fiaction. 

A correlation between the biochemical solubility and a-complementation (as indicated by 
30 blue color of colonies in the plate assays) also was demonstrated for the NBD/a fusion 
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constructs. Both CFTR NBD/a fusion proteins exhibit little to no blue color, and virtually all of 
the fusion protein partitions to the insoluble fraction whether expressed with (DH5a expression) 
or without (BL21 expression) the a-fragment. In contrast, MJ1267, when expressed as an a- 
fragment fusion, produces a significantly higher level of blue color relative to either of the 
5 CFTR-NBD/a fiisions. This correlates with the partial solubility of MJ1267 either with (DH5a 
expreression) or without (BL21 expression) the a-fragment. Taken together, these results 
suggest that in these cases, the relatively small a-fragment, when fused to a target polypeptide, 
does not have large effects on the target's solubility; neither increasing that of the otherwise 
insoluble targets (CFTR-NBDs), nor decreasing that of the partially soluble one (MJ1267). 

10 A quantitative measure of a-complementation of P-gal by each of the fusion targets was 

obtained by the direct measurement of activity in cell lysates. A total of four MBP folding 
variants were utilized to establish the quantitative relationship within a target system between P- 
gal activity and biochemical solubility. Table 3 summarizes the results of these in vitro enzyme 
assays. 



TABLES 


Target Protein 


P-gal Activity (units/ceU) 


MBP wild-type 


102 +/- 19 


G32D 


94+/- 21 


DSP 


46 +/- 12 


G32D/I33P 


14 +/- 3 


GST 


134+/- 8 


TRx 


159+/- 14 


CFTRNBDl-B 


5+/- 1 


CFTRNBDl-D 


6+1-2 


MJ1267 (LivF) 


12 +/- 6 



A unit of P-gall activity is defined as the amount of enzyme required to hydrolyzc one |miole of 
ONPG to o-nitrophenol and D-galactose per minute. Note that the polylinker between MBP (and 
mutants thereof) and the a-fragment is 36 residues in length. This linker was reduced to 9 residues 
during construction of the CTTR-, LivF-, GST-, and TElx-a fusion constructs. 
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Activity correlates well with the relative levels of blue color observed for these constructs. The 
plate assay is less able to distinguish highly soluble targets from those of intermediate solubility 
(MBP single mutants) most likely due to integration of the signal during growth of the colonies. 
FIG. 2 shows a linear relationship between the enzymatic activity (Table 3) and the biochemical 
5 soluble fraction for each of the MBP/a fiisions as assessed by densitometry of Coomassie- 
stained gels. Again, the activities show a linear correlation with the periplasmic folding yields 
for the unfiised MBPs reported by Betton and Hofiiung (1996), fiuther supporting the assay's 
ability to report on the intrinsic folding/solubility properties of the target proteins. The differing 
magnitude of the effects reported here when compared with those previously reported by Betton 
10 and Hofiiung (1996) may reflect the cellular environments where folding takes place since the 
present constructs must fold in the cytoplasm. 

In addition to cystic fibrosis, many other human diseases are associated with 
inappropriate folding and/or aggregation of proteins (Thomas et al, 1995; Tan & Pepys, 1994; 

15 Wells & Warren, 1998). To test whether the structural complementation assay has application to 
such proteins, the Alzheimer's Ap (1-42) peptide, which forms insoluble fibrils in the brains of 
affected individuals, was selected as an additional test case. When fiised to the a-fragment and 
expressed in E, coli on indicator plates, the fiision protein is unable to efficiently complement P- 
gal activity, resulting in very little development of blue color. In contrast, mutation of 

20 phenylalanine to proline at position 19 of Ap (F19P), a mutation known to retard fibril formation 
in vitro (Wood et al., 1995), results in a clear and measurable increase in blue color on indicator 
plates, approximately a three-fold increase in p-gal activity, and increased fiision protein in the 
soluble fraction at equivalent levels of expression. Recently, Culvenor and co-workers reported 
the production of "large intracellular deposits" of Ap-immunoreactive material upon the 

25 expression of Ap(l-42) as a tandem head-to-tail duplex in yeast (Culvenor et al.y 1998). To 
assess the ability of this assay to report on the solubility state of such a construct, the inventors 
assembled and expressed a tandem repeat of Ap as a fiision with the a-fragment (Ap-rpt). 
Colonies expressing the Ap-rpt/a fiision protein exhibit no detectable blue color on indicator 
plates, in vitro p-gal activity less than that observed for the wild-type Ap/a fiision, and no 

30 detectable protein is in the soluble firaction. Interestingly, the AP-rpt protein aggregates to form 
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a ladder of increasingly higher molecular weight insoluble species, a property absent from the 
single Ap/a fusion and perhaps more reflective of the disease condition. 

5 All of the compositions and/or methods disclosed and claimed herein can be made and 

executed without undue experimentation in light of the present disclosure. While the 
compositions and methods of this invention have been described in terms of preferred 
embodiments, it will be apparent to those of skill in the art that variations may be applied to the 
compositions and/or methods and in the steps or in the sequence of steps of the method described 

10 herein without departing from the concept, spirit and scope of the invention. More specifically, 
it will be apparent that certain agents which are both chemically and physiologically related may 
be substituted for the agents described herein while the same or similar results would be 
achieved. All such similar substitutes and modifications apparent to those skilled in the art are 
deemed to be within the spirit, scope and concept of the invention as defined by the appended 

15 claims. 
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